Search | Korea Science

Automatic Vowel Onset Point Detection Based on Auditory Frequency Response (청각 주파수 응답에 기반한 자동 모음 개시 지점 탐지)

Zang, Xian;Kim, Hag-Tae;Chong, Kil-To
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.13 no.1
- /
- pp.333-342
- /
- 2012
This paper presents a vowel onset point (VOP) detection method based on the human auditory system. This method maps the "perceptual" frequency scale, i.e. Mel scale onto a linear acoustic frequency, and then establishes a series of Triangular Mel-weighted Filter Bank simulate the function of band pass filtering in human ear. This nonlinear critical-band filter bank helps greatly reduce the data dimensionality, and eliminate the effect of harmonic waves to make the formants more prominent in the nonlinear spaced Mel spectrum. The sum of mel spectrum peaks energy is extracted as feature for each frame, and the instinct at which the energy amplitude starts rising sharply is detected as VOP, by convolving with Gabor window. For the single-word database which contains 12 vowels articulated with different kinds of consonants, the experimental results showed a good average detection rate of 72.73%, higher than other vowel detection methods based on short-time energy and zero-crossing rate.
https://doi.org/10.5762/KAIS.2012.13.1.333 인용 PDF KSCI

Lip Reading Method Using CNN for Utterance Period Detection (발화구간 검출을 위해 학습된 CNN 기반 입 모양 인식 방법)

Kim, Yong-Ki;Lim, Jong Gwan;Kim, Mi-Hye
- Journal of Digital Convergence
- /
- v.14 no.8
- /
- pp.233-243
- /
- 2016
Due to speech recognition problems in noisy environment, Audio Visual Speech Recognition (AVSR) system, which combines speech information and visual information, has been proposed since the mid-1990s,. and lip reading have played significant role in the AVSR System. This study aims to enhance recognition rate of utterance word using only lip shape detection for efficient AVSR system. After preprocessing for lip region detection, Convolution Neural Network (CNN) techniques are applied for utterance period detection and lip shape feature vector extraction, and Hidden Markov Models (HMMs) are then used for the recognition. As a result, the utterance period detection results show 91% of success rates, which are higher performance than general threshold methods. In the lip reading recognition, while user-dependent experiment records 88.5%, user-independent experiment shows 80.2% of recognition rates, which are improved results compared to the previous studies.
https://doi.org/10.14400/JDC.2016.14.8.233 인용 PDF KSCI

Automatic Recognition of Symbol Objects in P&IDs using Artificial Intelligence (인공지능 기반 플랜트 도면 내 심볼 객체 자동화 검출)

Shin, Ho-Jin;Jeon, Eun-Mi;Kwon, Do-kyung;Kwon, Jun-Seok;Lee, Chul-Jin
- Plant Journal
- /
- v.17 no.3
- /
- pp.37-41
- /
- 2021
P&ID((Piping and Instrument Diagram) is a key drawing in the engineering industry because it contains information about the units and instrumentation of the plant. Until now, simple repetitive tasks like listing symbols in P&ID drawings have been done manually, consuming lots of time and manpower. Currently, a deep learning model based on CNN(Convolutional Neural Network) is studied for drawing object detection, but the detection time is about 30 minutes and the accuracy is about 90%, indicating performance that is not sufficient to be implemented in the real word. In this study, the detection of symbols in a drawing is performed using 1-stage object detection algorithms that process both region proposal and detection. Specifically, build the training data using the image labeling tool, and show the results of recognizing the symbol in the drawing which are trained in the deep learning model.
PDF KSCI

Functional MR Imaging of Language System : Comparative Study between Visual and Auditory Instructions in Word Generation Task (언어 중추 영역에 대한 기능적 자기공명영상: 시각적, 청각적 지시 과제에 관한 비교)

구은회;권대철;김동성;송인찬
- Journal of Biomedical Engineering Research
- /
- v.24 no.4
- /
- pp.241-246
- /
- 2003
To evaluate the usefulness if functional MR imaging(MRI) for the determination of language dominance system and to assess differences in the visual and auditory instrument language generation task according to activation task or activated area. Functional maps of the language area were obtained during visual and auditory instructions in word generation tasks in 6 healthy volunteer with right-handness were examined on a 1.5T scanner and the EPI BOLD technique, and three pulse sequence technique get of the true axial planes. Both task consisted of 96 phases including 6 activations and rests contents. Postprocessing were done on MRDx program by using cross correlation method. Two task compare the blain activation area surveyed of 1anguage lateralization index. To evaluated of the detection rates of Broca. Wernicke, pre-frontal lobe, Supplementary Motor Area (SMA) and pre-motor cortex areas and the differences of language lateraliaztion among two word generation task To lateralization index survey in 1anguage area on right and left in brain get to activation area pixel in brain. Compared to visual and auditory instrument task in the language areas get to the lateralization index. Two language generation task high detection rates of Broca and Wernicke areas. The visual instruction no detected in the auditory area, and auditory instruction no detected in the visual area. There was statistics significant different of them among language generation task. 1'his indicated that language area obtained image of the brain functional MR imaging usefulness in the visual and auditory task instrument.
PDF KSCI

Auto-Analysis of Traffic Flow through Semantic Modeling of Moving Objects (움직임 객체의 의미적 모델링을 통한 차량 흐름 자동 분석)

Choi, Chang;Cho, Mi-Young;Choi, Jun-Ho;Choi, Dong-Jin;Kim, Pan-Koo
- The Journal of The Korea Institute of Intelligent Transport Systems
- /
- v.8 no.6
- /
- pp.36-45
- /
- 2009
Recently, there are interested in the automatic traffic flowing and accident detection using various low level information from video in the road. In this paper, the automatic traffic flowing and algorithm, and application of traffic accident detection using traffic management systems are studied. To achieve these purposes, the spatio-temporal relation models using topological and directional relations have been made, then a matching of the proposed models with the directional motion verbs proposed by Levin's verbs of inherently directed motion is applied. Finally, the synonym and antonym are inserted by using WordNet. For the similarity measuring between proposed modeling and trajectory of moving object in the video, the objects are extracted, and then compared with the trajectories of moving objects by the proposed modeling. Because of the different features with each proposed modeling, the rules that have been generated will be applied to the similarity measurement by TSR (Tangent Space Representation). Through this research, we can extend our results to the automatic accident detection of vehicle using CCTV.
PDF

Modified File Title Normalization Techniques for Copyright Protection (저작권 보호를 위한 변형된 파일 제목 정규화 기법)

Hwang, Chan Woong;Ha, Ji Hee;Lee, Tea Jin
- Convergence Security Journal
- /
- v.19 no.4
- /
- pp.133-142
- /
- 2019
Although torrents and P2P sites or web hard are frequently used by users simply because they can be easily downloaded freely or at low prices, domestic torrent and P2P sites or web hard are very sensitive to copyright. Techniques have been researched and applied. Among these, title and string comparison method filtering techniques that block the number of cases such as file titles or combinations of key words are blocked by changing the title and spacing. Bypass is easy through. In order to detect and block illegal works for copyright protection, a technique for normalizing modified file titles is essential. In this paper, we compared the detection rate by searching before and after normalizing the modified file title of illegal works and normalizing the file title. Before the normalization, the detection rate was 77.72%, which was unfortunate while the detection rate was 90.23% after the normalization. In the future, it is expected that better handling of nonsense terms, such as common date and quality display, will yield better results.
https://doi.org/10.33778/kcsa.2019.19.4.133 인용 PDF KSCI

Spam Image Detection Model based on Deep Learning for Improving Spam Filter

Seong-Guk Nam;Dong-Gun Lee;Yeong-Seok Seo
- Journal of Information Processing Systems
- /
- v.19 no.3
- /
- pp.289-301
- /
- 2023
Due to the development and dissemination of modern technology, anyone can easily communicate using services such as social network service (SNS) through a personal computer (PC) or smartphone. The development of these technologies has caused many beneficial effects. At the same time, bad effects also occurred, one of which was the spam problem. Spam refers to unwanted or rejected information received by unspecified users. The continuous exposure of such information to service users creates inconvenience in the user's use of the service, and if filtering is not performed correctly, the quality of service deteriorates. Recently, spammers are creating more malicious spam by distorting the image of spam text so that optical character recognition (OCR)-based spam filters cannot easily detect it. Fortunately, the level of transformation of image spam circulated on social media is not serious yet. However, in the mail system, spammers (the person who sends spam) showed various modifications to the spam image for neutralizing OCR, and therefore, the same situation can happen with spam images on social media. Spammers have been shown to interfere with OCR reading through geometric transformations such as image distortion, noise addition, and blurring. Various techniques have been studied to filter image spam, but at the same time, methods of interfering with image spam identification using obfuscated images are also continuously developing. In this paper, we propose a deep learning-based spam image detection model to improve the existing OCR-based spam image detection performance and compensate for vulnerabilities. The proposed model extracts text features and image features from the image using four sub-models. First, the OCR-based text model extracts the text-related features, whether the image contains spam words, and the word embedding vector from the input image. Then, the convolution neural network-based image model extracts image obfuscation and image feature vectors from the input image. The extracted feature is determined whether it is a spam image by the final spam image classifier. As a result of evaluating the F1-score of the proposed model, the performance was about 14 points higher than the OCR-based spam image detection performance.
https://doi.org/10.3745/JIPS.04.0274 인용 PDF

Real-time Printed Text Detection System using Deep Learning Model (딥러닝 모델을 활용한 실시간 인쇄물 문자 탐지 시스템)

Ye-Jun Choi;Song-Won Kim;Mi-Kyeong Moon
- The Journal of the Korea institute of electronic communication sciences
- /
- v.19 no.3
- /
- pp.523-530
- /
- 2024
Online, such as web pages and digital documents, have the ability to search for specific words or specific phrases that users want to search in real time. Printed materials such as printed books and reference books often have difficulty finding specific words or specific phrases in real time. This paper describes the development of a deep learning model for detecting text and a real-time character detection system using OCR for recognizing text. This study proposes a method of detecting text using the EAST model, a method of recognizing the detected text using EasyOCR, and a method of expressing the recognized text as a bounding box by comparing a specific word or specific phrase that the user wants to search for. Through this system, users expect to find specific words or phrases they want to search in real time in print, such as books and reference books, and find necessary information easily and quickly.
https://doi.org/10.13067/JKIECS.2024.19.3.523 인용 PDF

A New Endpoint Detection Method Based on Chaotic System Features for Digital Isolated Word Recognition System (음성인식을 위한 혼돈시스템 특성기반의 종단탐색 기법)

Zang, Xian;Chong, Kil-To
- Journal of the Institute of Electronics Engineers of Korea SC
- /
- v.46 no.5
- /
- pp.8-14
- /
- 2009
In the research field of speech recognition, pinpointing the endpoints of speech utterance even with the presence of background noise is of great importance. These noise present during recording introduce disturbances which complicates matters since what we just want is to get the stationary parameters corresponding to each speech section. One major cause of error in automatic recognition of isolated words is the inaccurate detection of the beginning and end boundaries of the test and reference templates, thus the necessity to find an effective method in removing the unnecessary regions of a speech signal. The conventional methods for speech endpoint detection are based on two linear time-domain measurements: the short-time energy, and short-time zero-crossing rate. They perform well for clean speech but their precision is not guaranteed if there is noise present, since the high energy and zero-crossing rate of the noise is mistaken as a part of the speech uttered. This paper proposes a novel approach in finding an apparent threshold between noise and speech based on Lyapunov Exponents (LEs). This proposed method adopts the nonlinear features to analyze the chaos characteristics of the speech signal instead of depending on the unreliable factor-energy. The excellent performance of this approach compared with the conventional methods lies in the fact that it detects the endpoints as a nonlinearity of speech signal, which we believe is an important characteristic and has been neglected by the conventional methods. The proposed method extracts the features based only on the time-domain waveform of the speech signal illustrating its low complexity. Simulations done showed the effective performance of the Proposed method in a noisy environment with an average recognition rate of up 92.85% for unspecified person.
PDF KSCI

The Detection and Correction of Context Dependent Errors of The Predicate using Noun Classes of Selectional Restrictions (선택 제약 명사의 의미 범주 정보를 이용한 용언의 문맥 의존 오류 검사 및 교정)

So, Gil-Ja;Kwon, Hyuk-Chul
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.18 no.1
- /
- pp.25-31
- /
- 2014
Korean grammar checkers typically detect context-dependent errors by employing heuristic rules; these rules are formulated by language experts and consisted of lexical items. Such grammar checkers, unfortunately, show low recall which is detection ratio of errors in the document. In order to resolve this shortcoming, a new error-decision rule-generalization method that utilizes the existing KorLex thesaurus, the Korean version of Princeton WordNet, is proposed. The method extracts noun classes from KorLex and generalizes error-decision rules from them using the Tree Cut Model and information-theory-based MDL (minimum description length).
https://doi.org/10.6109/jkiice.2014.18.1.25 인용 PDF KSCI

Search Result 220, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)