• Title/Summary/Keyword: phonetic data

Search Result 200, Processing Time 0.023 seconds

Classification of Phornographic Videos Using Audio Information (오디오 신호를 이용한 음란 동영상 판별)

  • Kim, Bong-Wan;Choi, Dae-Lim;Bang, Man-Won;Lee, Yong-Ju
    • Proceedings of the KSPS conference
    • /
    • 2007.05a
    • /
    • pp.207-210
    • /
    • 2007
  • As the Internet is prevalent in our life, harmful contents have been increasing on the Internet, which has become a very serious problem. Among them, pornographic video is harmful as poison to our children. To prevent such an event, there are many filtering systems which are based on the keyword based methods or image based methods. The main purpose of this paper is to devise a system that classifies the pornographic videos based on the audio information. We use Mel-Cepstrum Modulation Energy (MCME) which is modulation energy calculated on the time trajectory of the Mel-Frequency cepstral coefficients (MFCC) and MFCC as the feature vector and Gaussian Mixture Model (GMM) as the classifier. With the experiments, the proposed system classified the 97.5% of pornographic data and 99.5% of non-pornographic data. We expect the proposed method can be used as a component of the more accurate classification system which uses video information and audio information simultaneously.

  • PDF

An Electro-palatographic Study of Palatalization in the Japanese Alveolar Nasal

  • Tsuzuki, Masaki
    • Proceedings of the KSPS conference
    • /
    • 1996.10a
    • /
    • pp.333-336
    • /
    • 1996
  • It is widely known that the Japanese alveolar nasal (n) is affected by adjacent vowels in most positions, that is, the variants of the alveolar (n) occur conditionally. The Japanese (n) is palatalized under the influence of vowel (i) or palatal (j). In the articulation of (ni), for instance, the tip and sides of the tongue make wide contact with the palate. It is interesting to know how palatalization occurs and varies during the production in different contexts. In my presentation, the actual realization of the palatalized alveolar nasal in different contexts is examined and clarified by consider me the Electro-palatographic data and examining the articulatory feel ins and auditory impression. As a result, palatalized (equation omitted) occurs either word-initially or inter-vocalically. (equation omitted) in (equation omitted) and (equation omitted) has great palatality. When conditioned by (j), the (equation omitted) in (equation omitted), (equation omitted) and (equation omitted) has full palatality. In each sound the average number of contacted electrodes of the Electro-palatograph at maximum tongue-palate contact is 63 or 100% of the total. To summarize the experimental data, articulatory feel ins and auditory impression, it can be concluded that the (n) followed by or hemmed in (i), (j) is a palatalized nasal (equation omitted).

  • PDF

An Electro-palatographic Study of Palatalization in the Japanese Alveolar Nasal

  • Masaki Tsuzuki
    • MALSORI
    • /
    • no.31_32
    • /
    • pp.223-238
    • /
    • 1996
  • It is widely hewn that the Japanese alveolar nasal [n] is affected by adjacent vowels in most positions, that is, the variants of the alveolar [n] occur conditionally. The Japanese [n] is palatalized under the influence of vowel [i] or palatal [j]. In the articulation of 'に', for instance, the tip and sides of the tongue make wide contact with the palate. It is interesting to know how palatalization occurs and varies during the production in different contexts. In my presentation the actual realization of the palatalized alveolar nasal in different contexts is examined and clarified by considering the Electro-palatographic data and examining the articulatory feeling and auditory impression. As a result, palatalized [${\eta}$] occurs either word-initially- or inter-vocalically. [${\eta}$] in [${\eta}$i] and 'いに'[$i{\eta}$] has great palatality. When conditioned by [j], the [${\eta}$] in 'にゃ'[${\eta}$ja], 'にょ'[${\eta}jo$] and 'にゅ'[${\eta}jw$] has full palatality. In each sound the average number of contacted electrodes of the Electro-palatograph at maximum tongue-palate contact is 63 or 100% of the total. To summarize the experimental data, articulatory feeling and auditory impression, it can be concluded that 'the [n] followed by or hemmed in [i], [j] is a palatalized nasal [${\eta}$].

  • PDF

A Study on Data Sharing Codes Definition of Chinese in CAI Application Programs (CAI 응용프로그램 작성시 자료공유를 위한 한자 코드 체계 정의에 관한 연구)

  • Kho, Dae-Ghon
    • Journal of The Korean Association of Information Education
    • /
    • v.2 no.2
    • /
    • pp.162-173
    • /
    • 1998
  • Writing a CAI program containing Chinese characters requires a common Chinese character code to share information for educational purposes. A Chinese character code setting needs to allow a mixed use of both vowel and stroke order, to represent Chinese characters in simplified Chinese as well as in Japanese version, and to have a conversion process for data exchange among different sets of Chinese codes. Waste in code area is expected when vowel order is used because heteronyms are recognized as different. However, using stroke order facilitates in data recovery preventing duplicate code generation, though it does not comply with the phonetic rule. We claim that the first and second level Chinese code area needs to be expanded as much as academic and industrial circles have demanded. Also, we assert that Unicode can be a temporary measure for an educational code system due to its interoperability, expandability, and expressivity of character sets.

  • PDF

Speech Data Collection for korean Speech Recognition (한국어 음성인식을 위한 음성 데이터 수집)

  • Park, Jong-Ryeal;Kwon, Oh-Wook;Kim, Do-Yeong;Choi, In-Jeong;Jeong, Ho-Young;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.4
    • /
    • pp.74-81
    • /
    • 1995
  • This paper describes the development of speech databases for the Korean language which were constructed at Communications Research Laboratory in KAIST. The procedure and environment to construct the speech database are presented in detail, and the phonetic and linguistic properties of the databases are presented. the databases were intended for use in designing and evaluating speech recognition algorithms. The databases consist of five different sets of speech contents : trade-related continuous speech with 3,000 words, variable-length connected digits, phoneme-balanced 75 isolated words, 500 isolated Korean provincial names, and Korean A-set words.

  • PDF

A study on extraction of the frames representing each phoneme in continuous speech (연속음에서의 각 음소의 대표구간 추출에 관한 연구)

  • 박찬응;이쾌희
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.4
    • /
    • pp.174-182
    • /
    • 1996
  • In continuous speech recognition system, it is possible to implement the system which can handle unlimited number of words by using limited number of phonetic units such as phonemes. Dividing continuous speech into the string of tems of phonemes prior to recognition process can lower the complexity of the system. But because of the coarticulations between neiboring phonemes, it is very difficult ot extract exactly their boundaries. In this paper, we propose the algorithm ot extract short terms which can represent each phonemes instead of extracting their boundaries. The short terms of lower spectral change and higher spectral chang eare detcted. Then phoneme changes are detected using distance measure with this lower spectral change terms, and hgher spectral change terms are regarded as transition terms or short phoneme terms. Finally lower spectral change terms and the mid-term of higher spectral change terms are regarded s the represent each phonemes. The cepstral coefficients and weighted cepstral distance are used for speech feature and measuring the distance because of less computational complexity, and the speech data used in this experimetn was recoreded at silent and ordinary in-dorr environment. Through the experimental results, the proposed algorithm showed higher performance with less computational complexity comparing with the conventional segmetnation algorithms and it can be applied usefully in phoneme-based continuous speech recognition.

  • PDF

Post-Affricate Phonatory Processes in Korean and English: Acoustic Correlates and Implications for Phonological Analysis

  • Ahn, Hyun-Kee
    • Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.137-148
    • /
    • 2002
  • This study investigates phonation modes of vowels following the affricate consonants in Korean and English- -tense affricate /c'/, lenis affricate /c/, and aspirated affricate /$c^{h}$/ for Korean; voiced affricate /$\check{J}$/ and aspirated affricate /c/ for English. The investigation makes significant use of the H1*-H2* measure (a normalized amplitude difference between the first and second harmonics) to provide acoustic correlates of the phonation types. The major findings for English are that the H1*-H2* measure at the vowel onset was significantly larger in post-aspirated position than in post-voiced position. The Korean data showed the H1*-H2* measure at the vowel onset to be significantly higher in the post-aspirated class than in the post-tense class. On the other hand, the Fo values for the post-lenis vowels were significantly lower than those of the other two classes during the first half of the vowel. Based on the phonetic results, this study argues for the need to incorporate the [stiff vocal folds] and [slack vocal folds] features into the phonological treatments of Korean affricates, while maintaining the two features [constricted glottis] and [spread glottis].

  • PDF

Optimizing Multiple Pronunciation Dictionary Based on a Confusability Measure for Non-native Speech Recognition (타언어권 화자 음성 인식을 위한 혼잡도에 기반한 다중발음사전의 최적화 기법)

  • Kim, Min-A;Oh, Yoo-Rhee;Kim, Hong-Kook;Lee, Yeon-Woo;Cho, Sung-Eui;Lee, Seong-Ro
    • MALSORI
    • /
    • no.65
    • /
    • pp.93-103
    • /
    • 2008
  • In this paper, we propose a method for optimizing a multiple pronunciation dictionary used for modeling pronunciation variations of non-native speech. The proposed method removes some confusable pronunciation variants in the dictionary, resulting in a reduced dictionary size and less decoding time for automatic speech recognition (ASR). To this end, a confusability measure is first defined based on the Levenshtein distance between two different pronunciation variants. Then, the number of phonemes for each pronunciation variant is incorporated into the confusability measure to compensate for ASR errors due to words of a shorter length. We investigate the effect of the proposed method on ASR performance, where Korean is selected as the target language and Korean utterances spoken by Chinese native speakers are considered as non-native speech. It is shown from the experiments that an ASR system using the multiple pronunciation dictionary optimized by the proposed method can provide a relative average word error rate reduction of 6.25%, with 11.67% less ASR decoding time, as compared with that using a multiple pronunciation dictionary without the optimization.

  • PDF

A Development of Image Transfer Remote Maintenance Monitoring System for Hand Held Device (휴대용 화상전송 원격정비 감시시스템의 개발)

  • Kim, Dong-Wan;Park, Sung-Won
    • The Transactions of the Korean Institute of Electrical Engineers P
    • /
    • v.58 no.3
    • /
    • pp.276-284
    • /
    • 2009
  • In this paper, we develop the image transfer remote maintenance monitoring system for hand held device which can compensate defects of human mistake. The human mistakes always happen when the worker communicate information each other to check and maintenance the equipment of the power plant under bad circumstance such as small place and long distance in power plant. A worker couldn't converse with other when in noisy place like Power plant. So, we make some hand device for handy size and able to converse in noisy place. The developed system can have improvement of productivity through increasing plant operation time. And developed system is composed of advanced H/W(hard ware) system and S/W(soft ware)system. The H/W system consist of media server unit, communication equipment with hand held device, portable camera, mike and head set. The advanced s/w system consist of data base system, client pc(personal computer) real time monitoring system which has server GUI(graphic user interface) program, wireless monitoring program and wire ethernet communication program. The client GUI program is composed of total solution program as pc camera program, and phonetic conversation program etc.. We analyzed the required items and investigated applicable part in the image transfer remote maintenance monitoring system with hand held device. Also we investigated linkage of communication protocol for developed prototype, developed software tool of two-way communication and realtime recording skill of voice with image. We confirmed the efficiency by the field test in preventive maintenance of plant power.

Implementation of HMM Based Speech Recognizer with Medium Vocabulary Size Using TMS320C6201 DSP (TMS320C6201 DSP를 이용한 HMM 기반의 음성인식기 구현)

  • Jung, Sung-Yun;Son, Jong-Mok;Bae, Keun-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.1E
    • /
    • pp.20-24
    • /
    • 2006
  • In this paper, we focused on the real time implementation of a speech recognition system with medium size of vocabulary considering its application to a mobile phone. First, we developed the PC based variable vocabulary word recognizer having the size of program memory and total acoustic models as small as possible. To reduce the memory size of acoustic models, linear discriminant analysis and phonetic tied mixture were applied in the feature selection process and training HMMs, respectively. In addition, state based Gaussian selection method with the real time cepstral normalization was used for reduction of computational load and robust recognition. Then, we verified the real-time operation of the implemented recognition system on the TMS320C6201 EVM board. The implemented recognition system uses memory size of about 610 kbytes including both program memory and data memory. The recognition rate was 95.86% for ETRI 445DB, and 96.4%, 97.92%, 87.04% for three kinds of name databases collected through the mobile phones.