Search | Korea Science

Audio-Visual Integration based Multi-modal Speech Recognition System (오디오-비디오 정보 융합을 통한 멀티 모달 음성 인식 시스템)

Lee, Sahng-Woon;Lee, Yeon-Chul;Hong, Hun-Sop;Yun, Bo-Hyun;Han, Mun-Sung
- Proceedings of the Korea Information Processing Society Conference
- /
- 2002.11a
- /
- pp.707-710
- /
- 2002
본 논문은 오디오와 비디오 정보의 융합을 통한 멀티 모달 음성 인식 시스템을 제안한다. 음성 특징 정보와 영상 정보 특징의 융합을 통하여 잡음이 많은 환경에서 효율적으로 사람의 음성을 인식하는 시스템을 제안한다. 음성 특징 정보는 멜 필터 캡스트럼 계수(Mel Frequency Cepstrum Coefficients: MFCC)를 사용하며, 영상 특징 정보는 주성분 분석을 통해 얻어진 특징 벡터를 사용한다. 또한, 영상 정보 자체의 인식률 향상을 위해 피부 색깔 모델과 얼굴의 형태 정보를 이용하여 얼굴 영역을 찾은 후 강력한 입술 영역 추출 방법을 통해 입술 영역을 검출한다. 음성-영상 융합은 변형된 시간 지연 신경 회로망을 사용하여 초기 융합을 통해 이루어진다. 실험을 통해 음성과 영상의 정보 융합이 음성 정보만을 사용한 것 보다 대략 5%-20%의 성능 향상을 보여주고 있다.
PDF

A Feature-Based Retrieval Technique for Image Database (특징기반 영상 데이터베이스 검색 기법)

Kim, Bong-Gi;Oh, Hae-Seok
- The Transactions of the Korea Information Processing Society
- /
- v.5 no.11
- /
- pp.2776-2785
- /
- 1998
An image retrieval system based on image content is a key issue for building and managing large multimedia database, such as art galleries and museums, trademarks and copyrights, and picture archiving and communication system. Therefore, the interest on the subject of content-based image retrieval has been greatly increased for the last few years. This paper proposes a feature-based image retrieval technique which uses a compound feature vector representing both of color and shape of an image. Color information for the feature vector is obtained using the algebraic moment of each pixel of an image based on the property of regional color distribution. Shape information for the feature vector is obtained using the Improved Moment Invariant(IMI) which reduces the quantity of computation and increases retrieval efficiency. In the preprocessing phase for extracting shape feature, we transform a color image into a gray image. Since we make use of the modified DCT algorithm, it is implemented easily and can extract contour in real time. As an experiment, we have compared our method with previous methods using a database consisting of 150 automobile images, and the results of the experiment have shown that our method has the better performance on retrieval effectiveness.
PDF

A Feature Point Extraction and Identification Technique for Immersive Contents Using Deep Learning (딥 러닝을 이용한 실감형 콘텐츠 특징점 추출 및 식별 방법)

Park, Byeongchan;Jang, Seyoung;Yoo, Injae;Lee, Jaechung;Kim, Seok-Yoon;Kim, Youngmo
- Journal of IKEEE
- /
- v.24 no.2
- /
- pp.529-535
- /
- 2020
As the main technology of the 4th industrial revolution, immersive 360-degree video contents are drawing attention. The market size of immersive 360-degree video contents worldwide is projected to increase from $6.7 billion in 2018 to approximately $70 billion in 2020. However, most of the immersive 360-degree video contents are distributed through illegal distribution networks such as Webhard and Torrent, and the damage caused by illegal reproduction is increasing. Existing 2D video industry uses copyright filtering technology to prevent such illegal distribution. The technical difficulties dealing with immersive 360-degree videos arise in that they require ultra-high quality pictures and have the characteristics containing images captured by two or more cameras merged in one image, which results in the creation of distortion regions. There are also technical limitations such as an increase in the amount of feature point data due to the ultra-high definition and the processing speed requirement. These consideration makes it difficult to use the same 2D filtering technology for 360-degree videos. To solve this problem, this paper suggests a feature point extraction and identification technique that select object identification areas excluding regions with severe distortion, recognize objects using deep learning technology in the identification areas, extract feature points using the identified object information. Compared with the previously proposed method of extracting feature points using stitching area for immersive contents, the proposed technique shows excellent performance gain.
https://doi.org/10.7471/ikeee.2020.24.2.529 인용 PDF KSCI

A Study on Recognition of Clustered Cells in Uterine Cervical Pap-Smear Image (군집을 이루는 자궁 경부암 세포 인식에 관한 연구)

최예찬;김선아;김호영;김백섭
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.04b
- /
- pp.511-513
- /
- 2000
PaP Smear 테스트는 자궁 경부암 진단에 가장 효율적인 방법으로 알려져 있다. 그러나 이 방법은 높은 위 음성률(false negative error, 15~50%)을 나타내고 있다. 이런 큰 오류율은 주로 다량의 세포 검사에 기인하여, 자동화 시스템의 개발이 절실히 요구되고 있다. 본 논문은 자궁 경부암의 특징인 군집을 이루는 암세포를 인식할 수 있는 시스템을 제안한다. 시스템은 두 부분으로 나누어진다. 첫 단계에서는 저 배율(100배)에서 간단한 영상처리와 최소 근접 트리(Minimum Spanning Tree)를 통해 군집을 이루는 세포를 찾는다. 두 번째 단계서는 고 배율(400배)로 확대하여 군집 세포들로부터 여러 가지 특징을 추출한 후 KNN(k-Neighbor) 방법을 통해 인식하는 단계이다. 50개의 영상 (640X 480, RGB True Color 25 개의 100배 영상 , 25개의 400배 영상)이 실험에 사용되었다. 한 영상을 처리하는데 약 3초 (2.984초) 소요되었으며, 이는 region growing(20초)나 split and merge(58초) 방법 보다 덜 소요되었다. 100배 영상에서 정상과 비정상의 두 그룹으로 나누었을 경우에는 96%의 높은 인식율을 나타내었으나 비정상을 다시 5개의 그룹으로 나누었을 때는 45%로 나타내었다. 이는 영역 추출(segmentation) 단계에서 오류와 트레이닝 데이터의 비정확성에 기인한다. 400배 영상에서는 각각 92%와 30%로 나타내었다. 이는 영역추출 단계에서 사용한 Watershed 방법의 오류로 기인한 것으로 본다.
PDF

Design and Implementation of Video Clip Service System in Augmented Reality Using the SURF Algorithm (SURF 알고리즘을 이용한 증강현실 동영상 서비스 시스템의 설계 및 구현)

Jeon, Young-Joon;Shin, Hong-Seob;Kim, Jin-Il
- Journal of the Institute of Convergence Signal Processing
- /
- v.16 no.1
- /
- pp.22-28
- /
- 2015
In this paper, a service system which shows linked video clips from the static images extracted from newspapers, magazines, photo albums and etc in an augmented reality. First, the system uses SURF algorithm to extract features from the original photos printed in the media and stores them with the linked video clips. Next, when a photo is taken by using a camera from mobile devices such as smart phones, the system extracts features in real time, search a linked video clip matching the original image, and shows it on the smart phone in an augmented reality. The proposed system is applied to Android smart phone devices and the test results verify that the proposed system operates not only on normal photos but also on partially damaged photos.
PDF KSCI

Fingerprint Image Quality Analysis for Knowledge-based Image Enhancement (지식기반 영상개선을 위한 지문영상의 품질분석)

윤은경;조성배
- Journal of KIISE:Software and Applications
- /
- v.31 no.7
- /
- pp.911-921
- /
- 2004
Accurate minutiae extraction from input fingerprint images is one of the critical modules in robust automatic fingerprint identification system. However, the performance of a minutiae extraction is heavily dependent on the quality of the input fingerprint images. If the preprocessing is performed according to the fingerprint image characteristics in the image enhancement step, the system performance will be more robust. In this paper, we propose a knowledge-based preprocessing method, which extracts S features (the mean and variance of gray values, block directional difference, orientation change level, and ridge-valley thickness ratio) from the fingerprint images and analyzes image quality with Ward's clustering algorithm, and enhances the images with respect to oily/neutral/dry characteristics. Experimental results using NIST DB 4 and Inha University DB show that clustering algorithm distinguishes the image Quality characteristics well. In addition, the performance of the proposed method is assessed using quality index and block directional difference. The results indicate that the proposed method improves both the quality index and block directional difference.
PDF KSCI

Delineating the Prostate Boundary on TRUS Image Using Predicting the Texture Features and its Boundary Distribution (TRUS 영상에서 질감 특징 예측과 경계 분포를 이용한 전립선 경계 분할)

Park, Sunhwa;Kim, Hoyong;Seo, Yeong Geon
- Journal of Digital Contents Society
- /
- v.17 no.6
- /
- pp.603-611
- /
- 2016
Generally, the doctors manually delineated the prostate boundary seeing the image by their eyes, but the manual method not only needed quite much time but also had different boundaries depending on doctors. To reduce the effort like them the automatic delineating methods are needed, but detecting the boundary is hard to do since there are lots of uncertain textures or speckle noises. There have been studied in SVM, SIFT, Gabor texture filter, snake-like contour, and average-shape model methods. Besides, there were lots of studies about 2 and 3 dimension images and CT and MRI. But no studies have been developed superior to human experts and they need additional studies. For this, this paper proposes a method that delineates the boundary predicting its texture features and its average distribution on the prostate image. As result, we got the similar boundary as the method of human experts.
https://doi.org/10.9728/dcs.2016.17.6.603 인용 PDF KSCI

A New Temporal Filtering Method for Improved Automatic Lipreading (향상된 자동 독순을 위한 새로운 시간영역 필터링 기법)

Lee, Jong-Seok;Park, Cheol-Hoon
- The KIPS Transactions:PartB
- /
- v.15B no.2
- /
- pp.123-130
- /
- 2008
Automatic lipreading is to recognize speech by observing the movement of a speaker's lips. It has received attention recently as a method of complementing performance degradation of acoustic speech recognition in acoustically noisy environments. One of the important issues in automatic lipreading is to define and extract salient features from the recorded images. In this paper, we propose a feature extraction method by using a new filtering technique for obtaining improved recognition performance. The proposed method eliminates frequency components which are too slow or too fast compared to the relevant speech information by applying a band-pass filter to the temporal trajectory of each pixel in the images containing the lip region and, then, features are extracted by principal component analysis. We show that the proposed method produces improved performance in both clean and visually noisy conditions via speaker-independent recognition experiments.
https://doi.org/10.3745/KIPSTB.2008.15-B.2.123 인용 PDF KSCI

Atrous Residual U-Net for Semantic Segmentation in Street Scenes based on Deep Learning (딥러닝 기반 거리 영상의 Semantic Segmentation을 위한 Atrous Residual U-Net)

Shin, SeokYong;Lee, SangHun;Han, HyunHo
- Journal of Convergence for Information Technology
- /
- v.11 no.10
- /
- pp.45-52
- /
- 2021
In this paper, we proposed an Atrous Residual U-Net (AR-UNet) to improve the segmentation accuracy of semantic segmentation method based on U-Net. The U-Net is mainly used in fields such as medical image analysis, autonomous vehicles, and remote sensing images. The conventional U-Net lacks extracted features due to the small number of convolution layers in the encoder part. The extracted features are essential for classifying object categories, and if they are insufficient, it causes a problem of lowering the segmentation accuracy. Therefore, to improve this problem, we proposed the AR-UNet using residual learning and ASPP in the encoder. Residual learning improves feature extraction ability and is effective in preventing feature loss and vanishing gradient problems caused by continuous convolutions. In addition, ASPP enables additional feature extraction without reducing the resolution of the feature map. Experiments verified the effectiveness of the AR-UNet with Cityscapes dataset. The experimental results showed that the AR-UNet showed improved segmentation results compared to the conventional U-Net. In this way, AR-UNet can contribute to the advancement of many applications where accuracy is important.
https://doi.org/10.22156/CS4SMB.2021.11.10.045 인용 PDF KSCI

Effective Face Detection Using Principle Component Analysis and Support Vector Machine (주성분 분석과 서포트 백터 머신을 이용한 효과적인 얼굴 검출 시스템)

Kang, Byoung-Doo;Kwon, Oh-Hwa;Seong, Chi-Young;Jeon, Jae-Deok;Eom, Jae-Sung;Kim, Jong-Ho;Lee, Jae-Won;Kim, Sang-Kyoon
- Journal of Korea Multimedia Society
- /
- v.9 no.11
- /
- pp.1435-1444
- /
- 2006
We present an effective and real-time face detection method based on Principal Component Analysis(PCA) and Support Vector Machines(SVMs). We extract simple Haar-like features from training images that consist of face and non-face images, reinterpret the features with PCA, and select useful ones from the large number of extracted features. With the selected features, we construct a face detector using an SVM appropriate for binary classification. The face detector is not affected by the size of a training data set in a significant way, so that it showed 90.1 % detection rates with a small quantity of training data. it can process 8 frames per second for $320{\times}240$ pixel images. This is an acceptable processing time for a real-time system.
PDF

Search Result 2,333, Processing Time 0.032 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)