Search | Korea Science

A Study on the Spoken KOrean-Digit Recognition Using the Neural Netwok (神經網을 利用한 韓國語數字音認識에 관한 硏究)

Park, Hyun-Hwa;Gahang, Hae Dong;Bae, Keun Sung
- The Journal of the Acoustical Society of Korea
- /
- v.11 no.3
- /
- pp.5-13
- /
- 1992
Taking devantage of the property that Korean digit is a mono-syllable word, we proposed a spoken Korean-digit recognition scheme using the multi-layer perceptron. The spoken Korean-digit is divided into three segments (initial sound, medial vowel, and final consonant) based on the voice starting / ending points and a peak point in the middle of vowel sound. The feature vectors such as cepstrum, reflection coefficients, ${\Delta}$cepstrum and ${\Delta}$energy are extracted from each segment. It has been shown that cepstrum, as an input vector to the neural network, gives higher recognition rate than reflection coefficients. Regression coefficients of cepstrum did not affect as much as we expected on the recognition rate. That is because, it is believed, we extracted features from the selected stationary segments of the input speech signal. With 150 ceptral coefficients obtained from each spoken digit, we achieved correct recognition rate of 97.8%.
PDF

A Study on Image Generation from Sentence Embedding Applying Self-Attention (Self-Attention을 적용한 문장 임베딩으로부터 이미지 생성 연구)

Yu, Kyungho;No, Juhyeon;Hong, Taekeun;Kim, Hyeong-Ju;Kim, Pankoo
- Smart Media Journal
- /
- v.10 no.1
- /
- pp.63-69
- /
- 2021
When a person sees a sentence and understands the sentence, the person understands the sentence by reminiscent of the main word in the sentence as an image. Text-to-image is what allows computers to do this associative process. The previous deep learning-based text-to-image model extracts text features using Convolutional Neural Network (CNN)-Long Short Term Memory (LSTM) and bi-directional LSTM, and generates an image by inputting it to the GAN. The previous text-to-image model uses basic embedding in text feature extraction, and it takes a long time to train because images are generated using several modules. Therefore, in this research, we propose a method of extracting features by using the attention mechanism, which has improved performance in the natural language processing field, for sentence embedding, and generating an image by inputting the extracted features into the GAN. As a result of the experiment, the inception score was higher than that of the model used in the previous study, and when judged with the naked eye, an image that expresses the features well in the input sentence was created. In addition, even when a long sentence is input, an image that expresses the sentence well was created.
https://doi.org/10.30693/SMJ.2021.10.1.63 인용 PDF KSCI

Spam Image Detection Model based on Deep Learning for Improving Spam Filter

Seong-Guk Nam;Dong-Gun Lee;Yeong-Seok Seo
- Journal of Information Processing Systems
- /
- v.19 no.3
- /
- pp.289-301
- /
- 2023
Due to the development and dissemination of modern technology, anyone can easily communicate using services such as social network service (SNS) through a personal computer (PC) or smartphone. The development of these technologies has caused many beneficial effects. At the same time, bad effects also occurred, one of which was the spam problem. Spam refers to unwanted or rejected information received by unspecified users. The continuous exposure of such information to service users creates inconvenience in the user's use of the service, and if filtering is not performed correctly, the quality of service deteriorates. Recently, spammers are creating more malicious spam by distorting the image of spam text so that optical character recognition (OCR)-based spam filters cannot easily detect it. Fortunately, the level of transformation of image spam circulated on social media is not serious yet. However, in the mail system, spammers (the person who sends spam) showed various modifications to the spam image for neutralizing OCR, and therefore, the same situation can happen with spam images on social media. Spammers have been shown to interfere with OCR reading through geometric transformations such as image distortion, noise addition, and blurring. Various techniques have been studied to filter image spam, but at the same time, methods of interfering with image spam identification using obfuscated images are also continuously developing. In this paper, we propose a deep learning-based spam image detection model to improve the existing OCR-based spam image detection performance and compensate for vulnerabilities. The proposed model extracts text features and image features from the image using four sub-models. First, the OCR-based text model extracts the text-related features, whether the image contains spam words, and the word embedding vector from the input image. Then, the convolution neural network-based image model extracts image obfuscation and image feature vectors from the input image. The extracted feature is determined whether it is a spam image by the final spam image classifier. As a result of evaluating the F1-score of the proposed model, the performance was about 14 points higher than the OCR-based spam image detection performance.
https://doi.org/10.3745/JIPS.04.0274 인용 PDF

Design Observable Model of Direct Drive Motor for Air Gap Estimation when Input Disturbance is Impulse signal (외란이 충격 신호일 때 공극 추정을 위한 직구동 모터의 관측 가능한 수학적 모델 수립)

Ki, Tae-Seok;Park, Youn-Sik;Park, Young-Jin
- Journal of Institute of Control, Robotics and Systems
- /
- v.18 no.7
- /
- pp.627-631
- /
- 2012
Observable mathematical model of DDM (Direct Dirve Motor) was suggested. The motor that operates the object system directly is called DDM. DDM has many strong points, however, it has a significant disadvantage, that it is more sensitive to the external force than the motor with reduction gear. In other word, if the force is applied, air gap of the motor can be perturbed. This causes not only difficulty in motor control but also even more serious problem, such as the breakdown of motor. However, if the air gap variation can be estimated, it can help prevent these problems. DDM should be modeled to estimate the air gap variation. The type of researched DDM is PMSM (Permanent Magnet Synchronous Motor) and precedent model of PMSM includes only characteristics of electro-magnetic system and rotational motion. However, suggested model should also include characteristics of translational motion of rotor to estimate the air gap variation. Also, this model should satisfy observability condition, because state observer is designed based on this model.
https://doi.org/10.5302/J.ICROS.2012.18.7.627 인용 PDF KSCI

Speech Recognition Using MSVQ/TDRNN (MSVQ/TDRNN을 이용한 음성인식)

Kim, Sung-Suk
- The Journal of the Acoustical Society of Korea
- /
- v.33 no.4
- /
- pp.268-272
- /
- 2014
This paper presents a method for speech recognition using multi-section vector-quantization (MSVQ) and time-delay recurrent neural network (TDTNN). The MSVQ generates the codebook with normalized uniform sections of voice signal, and the TDRNN performs the speech recognition using the MSVQ codebook. The TDRNN is a time-delay recurrent neural network classifier with two different representations of dynamic context: the time-delayed input nodes represent local dynamic context, while the recursive nodes are able to represent long-term dynamic context of voice signal. The cepstral PLP coefficients were used as speech features. In the speech recognition experiments, the MSVQ/TDRNN speech recognizer shows 97.9 % word recognition rate for speaker independent recognition.
https://doi.org/10.7776/ASK.2014.33.4.268 인용 PDF KSCI

An Amplitude Warping Approach to Intra-Speaker Normalization for Speech Recognition (음성인식에서 화자 내 정규화를 위한 진폭 변경 방법)

Kim Dong-Hyun;Hong Kwang-Seok
- Journal of Internet Computing and Services
- /
- v.4 no.3
- /
- pp.9-14
- /
- 2003
The method of vocal tract normalization is a successful method for improving the accuracy of inter-speaker normalization. In this paper, we present an intra-speaker warping factor estimation based on pitch alteration utterance. The feature space distributions of untransformed speech from the pitch alteration utterance of intra-speaker would vary due to the acoustic differences of speech produced by glottis and vocal tract. The variation of utterance is two types: frequency and amplitude variation. The vocal tract normalization is frequency normalization among inter-speaker normalization methods. Therefore, we have to consider amplitude variation, and it may be possible to determine the amplitude warping factor by calculating the inverse ratio of input to reference pitch. k, the recognition results, the error rate is reduced from 0.4% to 2.3% for digit and word decoding.
PDF

A Colour Support System for Townscape Based on Kansei and Colour Harmony Models

Kinoshita, Yuichiro;Cooper, Eric;Kamei, Katsuari
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2003.09a
- /
- pp.435-438
- /
- 2003
A townscape has been a main factor in urban-development problems in Japan. In the townscape, keeping harmony with environment is a common goal. But useful and meaningful goals are expressing individuality and impression of the town in the townscape. In this paper, we propose the colony planning support system system to improve the townscape. The system finds propositional colour combinations based on three elements, town image, colour harmony, and cost. The targets of this model are mostly townscapes in residential areas that already exist, In this paper, we introduce the construction of a Kansei evaluation model to quantify the impression. First, we conducted computer-based evaluational experiments for 20 subjects using the SD method to clarify the relationship between town image and street colours. We chose 16 adjective words related to town image and prepared 100 colour picture samples for the evaluation. After the experiments, we constructed the model using a neural network for each word. We chose 62 experimental results for the training data of the neural network and 20 results for the testing data. Each colour in the data was selected to have unique hue, brightness or saturation attributes, After the construction, we tested the model for accuracy. We input the testing data into the constructed model and calculated errors between the output from the model and the experimental results. Testing of the model showed that the model worked well for more than 80％ of the samples. The model demonstrated influences of colours on the town image.
PDF

Development of a Document-Oriented and Web-Based Nuclear Design Automation System (문서중심 및 웹기반 노심설계 자동화 시스템 개발)

Park Yong Soo;Kim Jong Kyung
- Journal of Information Technology Applications and Management
- /
- v.11 no.4
- /
- pp.35-47
- /
- 2004
The nuclear design analysis requires time-consuming and erroneous model-input preparation. code run. output analysis and quality assurance process. To reduce human effort and improve design quality and productivity. Innovative Design Processor (IDP) is being developed. Two basic principles of IDP are the document-oriented desigll and the web-based design. The document-oriented design is that. if the designer writes a design document called active document and feeds it to a special program. the final document with complete analysis. table and plots is made automatically. The active documents can be written with Microsoft Word or created automatically on the web. which is another framework of IDP. Using the proper mix-up of server side and client side programming under the LAMP (Linux/Apache/MySQL/PHP) environment. it e design process on the web is modeled as a design wizard style so that even a novice designer makes the design document easily. This automation using the IDP is now being implemented for all the reload design of Korea Standard Nuclear Power Plant (KSNP) type PWRs. The introduction of this process will allow large reduction in all reload design efforts of KSNP and provide a platform for design and R&D tasks of KNFC.
PDF

Emotion Recognition using Robust Speech Recognition System (강인한 음성 인식 시스템을 사용한 감정 인식)

Kim, Weon-Goo
- Journal of the Korean Institute of Intelligent Systems
- /
- v.18 no.5
- /
- pp.586-591
- /
- 2008
This paper studied the emotion recognition system combined with robust speech recognition system in order to improve the performance of emotion recognition system. For this purpose, the effect of emotional variation on the speech recognition system and robust feature parameters of speech recognition system were studied using speech database containing various emotions. Final emotion recognition is processed using the input utterance and its emotional model according to the result of speech recognition. In the experiment, robust speech recognition system is HMM based speaker independent word recognizer using RASTA mel-cepstral coefficient and its derivatives and cepstral mean subtraction(CMS) as a signal bias removal. Experimental results showed that emotion recognizer combined with speech recognition system showed better performance than emotion recognizer alone.
https://doi.org/10.5391/JKIIS.2008.18.5.586 인용 PDF KSCI

Statistical Model-Based Voice Activity Detection Using Spatial Cues for Dual-Channel Noisy Speech Recognition (이중채널 잡음음성인식을 위한 공간정보를 이용한 통계모델 기반 음성구간 검출)

Shin, Min-Hwa;Park, Ji-Hun;Kim, Hong-Kook;Lee, Yeon-Woo;Lee, Seong-Ro
- Phonetics and Speech Sciences
- /
- v.2 no.3
- /
- pp.141-148
- /
- 2010
In this paper, voice activity detection (VAD) for dual-channel noisy speech recognition is proposed in which spatial cues are employed. In the proposed method, a probability model for speech presence/absence is constructed using spatial cues obtained from dual-channel input signal, and a speech activity interval is detected through this probability model. In particular, spatial cues are composed of interaural time differences and interaural level differences of dual-channel speech signals, and the probability model for speech presence/absence is based on a Gaussian kernel density. In order to evaluate the performance of the proposed VAD method, speech recognition is performed for speech segments that only include speech intervals detected by the proposed VAD method. The performance of the proposed method is compared with those of several methods such as an SNR-based method, a direction of arrival (DOA) based method, and a phase vector based method. It is shown from the speech recognition experiments that the proposed method outperforms conventional methods by providing relative word error rates reductions of 11.68%, 41.92%, and 10.15% compared with SNR-based, DOA-based, and phase vector based method, respectively.
PDF

Search Result 225, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)