통합 검색 | Korea Science

한국어 자동 발음열 생성 시스템을 위한 예외 발음 연구 (A Study on Exceptional Pronunciations For Automatic Korean Pronunciation Generator)

김선희
- 대한음성학회지:말소리
- /
- 제48호
- /
- pp.57-67
- /
- 2003
This paper presents a systematic description of exceptional pronunciations for automatic Korean pronunciation generation. An automatic pronunciation generator in Korean is an essential part of a Korean speech recognition system and a TTS (Text-To-Speech) system. It is composed of a set of regular rules and an exceptional pronunciation dictionary. The exceptional pronunciation dictionary is created by extracting the words that have exceptional pronunciations, based on the characteristics of the words of exceptional pronunciation through phonological research and the systematic analysis of the entries of Korean dictionaries. Thus, the method contributes to improve performance of automatic pronunciation generator in Korean as well as the performance of speech recognition system and TTS system in Korean.
PDF

시간지연 회귀 신경회로망을 이용한 피치 악센트 인식 (Automatic Recognition of Pitch Accents Using Time-Delay Recurrent Neural Network)

Kim, Sung-Suk;Kim, Chul;Lee, Wan-Joo
- The Journal of the Acoustical Society of Korea
- /
- 제23권4E호
- /
- pp.112-119
- /
- 2004
This paper presents a method for the automatic recognition of pitch accents with no prior knowledge about the phonetic content of the signal (no knowledge of word or phoneme boundaries or of phoneme labels). The recognition algorithm used in this paper is a time-delay recurrent neural network (TDRNN). A TDRNN is a neural network classier with two different representations of dynamic context: delayed input nodes allow the representation of an explicit trajectory F0(t), while recurrent nodes provide long-term context information that can be used to normalize the input F0 trajectory. Performance of the TDRNN is compared to the performance of a MLP (multi-layer perceptron) and an HMM (Hidden Markov Model) on the same task. The TDRNN shows the correct recognition of $91.9{\%}\;of\;pitch\;events\;and\;91.0{\%}$ of pitch non-events, for an average accuracy of $91.5{\%}$ over both pitch events and non-events. The MLP with contextual input exhibits $85.8{\%},\;85.5{\%},\;and\;85.6{\%}$ recognition accuracy respectively, while the HMM shows the correct recognition of $36.8{\%}\;of\;pitch\;events\;and\;87.3{\%}$ of pitch non-events, for an average accuracy of $62.2{\%}$ over both pitch events and non-events. These results suggest that the TDRNN architecture is useful for the automatic recognition of pitch accents.
PDF KSCI

Noise Robust Automatic Speech Recognition Scheme with Histogram of Oriented Gradient Features

Park, Taejin;Beack, SeungKwan;Lee, Taejin
- IEIE Transactions on Smart Processing and Computing
- /
- 제3권5호
- /
- pp.259-266
- /
- 2014
In this paper, we propose a novel technique for noise robust automatic speech recognition (ASR). The development of ASR techniques has made it possible to recognize isolated words with a near perfect word recognition rate. However, in a highly noisy environment, a distinct mismatch between the trained speech and the test data results in a significantly degraded word recognition rate (WRA). Unlike conventional ASR systems employing Mel-frequency cepstral coefficients (MFCCs) and a hidden Markov model (HMM), this study employ histogram of oriented gradient (HOG) features and a Support Vector Machine (SVM) to ASR tasks to overcome this problem. Our proposed ASR system is less vulnerable to external interference noise, and achieves a higher WRA compared to a conventional ASR system equipped with MFCCs and an HMM. The performance of our proposed ASR system was evaluated using a phonetically balanced word (PBW) set mixed with artificially added noise.
https://doi.org/10.5573/IEIESPC.2014.3.5.259 인용 PDF KSCI

Combining Machine Learning Techniques with Terrestrial Laser Scanning for Automatic Building Material Recognition

Yuan, Liang;Guo, Jingjing;Wang, Qian
- 국제학술발표논문집
- /
- The 8th International Conference on Construction Engineering and Project Management
- /
- pp.361-370
- /
- 2020
Automatic building material recognition has been a popular research interest over the past decade because it is useful for construction management and facility management. Currently, the extensively used methods for automatic material recognition are mainly based on 2D images. A terrestrial laser scanner (TLS) with a built-in camera can generate a set of coloured laser scan data that contains not only the visual features of building materials but also other attributes such as material reflectance and surface roughness. With more characteristics provided, laser scan data have the potential to improve the accuracy of building material recognition. Therefore, this research aims to develop a TLS-based building material recognition method by combining machine learning techniques. The developed method uses material reflectance, HSV colour values, and surface roughness as the features for material recognition. A database containing the laser scan data of common building materials was created and used for model training and validation with machine learning techniques. Different machine learning algorithms were compared, and the best algorithm showed an average recognition accuracy of 96.5%, which demonstrated the feasibility of the developed method.
PDF

Image Processing-based Object Recognition Approach for Automatic Operation of Cranes

Zhou, Ying;Guo, Hongling;Ma, Ling;Zhang, Zhitian
- 국제학술발표논문집
- /
- The 8th International Conference on Construction Engineering and Project Management
- /
- pp.399-408
- /
- 2020
The construction industry is suffering from aging workers, frequent accidents, as well as low productivity. With the rapid development of information technologies in recent years, automatic construction, especially automatic cranes, is regarded as a promising solution for the above problems and attracting more and more attention. However, in practice, limited by the complexity and dynamics of construction environment, manual inspection which is time-consuming and error-prone is still the only way to recognize the search object for the operation of crane. To solve this problem, an image-processing-based automated object recognition approach is proposed in this paper, which is a fusion of Convolutional-Neutral-Network (CNN)-based and traditional object detections. The search object is firstly extracted from the background by the trained Faster R-CNN. And then through a series of image processing including Canny, Hough and Endpoints clustering analysis, the vertices of the search object can be determined to locate it in 3D space uniquely. Finally, the features (e.g., centroid coordinate, size, and color) of the search object are extracted for further recognition. The approach presented in this paper was implemented in OpenCV, and the prototype was written in Microsoft Visual C++. This proposed approach shows great potential for the automatic operation of crane. Further researches and more extensive field experiments will follow in the future.
PDF

자동차 VIN 문자 인식 시스템 개발 (Development of VIN Character Recognition System for Motor)

이용중;이화춘;류재엽
- 한국공작기계학회:학술대회논문집
- /
- 한국공작기계학회 2000년도 추계학술대회논문집 - 한국공작기계학회
- /
- pp.68-73
- /
- 2000
This study to embody automatic recognition of VIN(Vehicle Identification Number)character by computer vision system. Automatic recognition characters methods consist of the thining processing and the recognition of each character. VIN character and background classified using counting method of the size of connected pixels. Thining processing applied to segmentation of connected fundamental phonemes by Hilditch's algorithm. Each VIN character contours tracing algorithm used the Freeman's direction tracing algorithm.
PDF

A Tree Regularized Classifier-Exploiting Hierarchical Structure Information in Feature Vector for Human Action Recognition

Luo, Huiwu;Zhao, Fei;Chen, Shangfeng;Lu, Huanzhang
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제11권3호
- /
- pp.1614-1632
- /
- 2017
Bag of visual words is a popular model in human action recognition, but usually suffers from loss of spatial and temporal configuration information of local features, and large quantization error in its feature coding procedure. In this paper, to overcome the two deficiencies, we combine sparse coding with spatio-temporal pyramid for human action recognition, and regard this method as the baseline. More importantly, which is also the focus of this paper, we find that there is a hierarchical structure in feature vector constructed by the baseline method. To exploit the hierarchical structure information for better recognition accuracy, we propose a tree regularized classifier to convey the hierarchical structure information. The main contributions of this paper can be summarized as: first, we introduce a tree regularized classifier to encode the hierarchical structure information in feature vector for human action recognition. Second, we present an optimization algorithm to learn the parameters of the proposed classifier. Third, the performance of the proposed classifier is evaluated on YouTube, Hollywood2, and UCF50 datasets, the experimental results show that the proposed tree regularized classifier obtains better performance than SVM and other popular classifiers, and achieves promising results on the three datasets.
https://doi.org/10.3837/tiis.2017.03.020 인용 PDF KSCI

과학수사를 위한 한국인 음성 특화 자동화자식별시스템 (Forensic Automatic Speaker Identification System for Korean Speakers)

김경화;소병민;유하진
- 말소리와 음성과학
- /
- 제4권3호
- /
- pp.95-101
- /
- 2012
In this paper, we introduce the automatic speaker identification system 'SPO(Supreme Prosecutors Office) Verifier'. SPO Verifier is a GMM(Gaussian mixture model)-UBM(universal background model) based automatic speaker recognition system and has been developed using Korean speakers' utterances. This system uses a channel compensation algorithm to compensate recording device characteristics. The system can give the users the ability to manage reference models with utterances from various environments to get more accurate recognition results. To evaluate the performance of SPO Verifier on Korean speakers, we compared this system with one of the most widely used commercial systems in the forensic field. The results showed that SPO Verifier shows lower EER(equal error rate) than that of the commercial system.
https://doi.org/10.13064/KSSS.2012.4.3.095 인용 PDF

Cooperative network와 MLP를 이용한 PSRI 특징추출 및 자동표적인식 (A PSRI Feature Extraction and Automatic Target Recognition Using a Cooperative Network and an MLP.)

전준형;김진호;최흥문
- 전자공학회논문지B
- /
- 제33B권6호
- /
- pp.198-207
- /
- 1996
A PSRI (position, scale, and rotation invariant ) feature extraction and automatic target recognition system using a cooperative network and an MLP is proposed. We can extract position invarient features by obtaining the target center using the projection and the moment in preprocessing stage. The scale and rotation invariant features are extracted from the contour projection of the number of edge pixels on each of the concentric circles, which is input to the cooperative network. By extracting the representative PSRI features form the features and their differentiations using max-net and min-net, we can rdduce the number of input neurons of the MLP, and make the resulted automatic target recognition system less sensitive to input variances. Experiments are conduted on various complex images which are shifted, rotated, or scaled, and the results show that the proposed system is very efficient for PSRI feature extractions and automatic target recognitions.
PDF

한국어 파열음의 자동 인식에 대한 연구 : 한국어 치경 파열음의 자동 분류에 관한 연구 (A Study On The Automatic Discrimination Of The Korean Alveolar Stops)

최윤석;김기석;황희융
- 대한전기학회:학술대회논문집
- /
- 대한전기학회 1987년도 정기총회 및 창립40주년기념 학술대회 학회본부
- /
- pp.330-333
- /
- 1987
This paper is the study on the automatic discrimination of the Korean alveolar stops. In Korean, it is necessary to discriminate the asperate/tense plosive for the automatic speech recognition system because we, Korean, distinguish asperate/tense plosive allphones from tense and lax plosive. In order to detect acoustic cues for automatic recognition of the [ㄲ, ㄸ, ㅃ], we have experimented the discrimination of [ㄷ,ㄸ,ㅌ]. We used temporal cues like VOT and Silence Duration, etc., and energy cues like ratio of high frequency energy and low frequency energy as the acoustic parameters. The VCV speech data where V is the 8 Simple Vowels and C is the 3 alevolar stops, are used for experiments. The 192 speech data are experimented on and the recognition rate is resulted in about 82%-95%.
PDF

검색결과 1,070건 처리시간 0.031초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)