Search | Korea Science

Light weight architecture for acoustic scene classification (음향 장면 분류를 위한 경량화 모형 연구)

Lim, Soyoung;Kwak, Il-Youp
- The Korean Journal of Applied Statistics
- /
- v.34 no.6
- /
- pp.979-993
- /
- 2021
Acoustic scene classification (ASC) categorizes an audio file based on the environment in which it has been recorded. This has long been studied in the detection and classification of acoustic scenes and events (DCASE). In this study, we considered the problem that ASC faces in real-world applications that the model used should have low-complexity. We compared several models that apply light-weight techniques. First, a base CNN model was proposed using log mel-spectrogram, deltas, and delta-deltas features. Second, depthwise separable convolution, linear bottleneck inverted residual block was applied to the convolutional layer, and Quantization was applied to the models to develop a low-complexity model. The model considering low-complexity was similar or slightly inferior to the performance of the base model, but the model size was significantly reduced from 503 KB to 42.76 KB.
https://doi.org/10.5351/KJAS.2021.34.6.979 인용 PDF KSCI

Content-Based Image Retrieval System using Feature Extraction of Image Objects (영상 객체의 특징 추출을 이용한 내용 기반 영상 검색 시스템)

Jung Seh-Hwan;Seo Kwang-Kyu
- Journal of Korean Society of Industrial and Systems Engineering
- /
- v.27 no.3
- /
- pp.59-65
- /
- 2004
This paper explores an image segmentation and representation method using Vector Quantization(VQ) on color and texture for content-based image retrieval system. The basic idea is a transformation from the raw pixel data to a small set of image regions which are coherent in color and texture space. These schemes are used for object-based image retrieval. Features for image retrieval are three color features from HSV color model and five texture features from Gray-level co-occurrence matrices. Once the feature extraction scheme is performed in the image, 8-dimensional feature vectors represent each pixel in the image. VQ algorithm is used to cluster each pixel data into groups. A representative feature table based on the dominant groups is obtained and used to retrieve similar images according to object within the image. The proposed method can retrieve similar images even in the case that the objects are translated, scaled, and rotated.
PDF KSCI

Real-time Implementation of an Identifier for Nonstationary Time-varying Signals and Systems

Kim, Jong-Weon;Kim, Sung-Hwan
- The Journal of the Acoustical Society of Korea
- /
- v.15 no.3E
- /
- pp.13-18
- /
- 1996
A real-time identifier for the nonstationary time-varying signals and systems was implemented using a low cost DSP (digital signal processing) chip. The identifier is comprised of I/O units, a central processing unit, a control unit and its supporting software. In order t estimate the system accurately and to reduce quantization error during arithmetic operation, the firmware was programmed with 64-bit extended precision arithmetic. The performance of the identifier was verified by comparing with the simulation results. The implemented real-time identifier has negligible quantization errors and its real-time processing capability crresponds to 0.6kHz for the nonstationary AR (autoregressive) model with n=4 and m=1.
PDF

A Study on the Optimal Mahalanobis Distance for Speech Recognition

Lee, Chang-Young
- Speech Sciences
- /
- v.13 no.4
- /
- pp.177-186
- /
- 2006
In an effort to enhance the quality of feature vector classification and thereby reduce the recognition error rate of the speaker-independent speech recognition, we employ the Mahalanobis distance in the calculation of the similarity measure between feature vectors. It is assumed that the metric matrix of the Mahalanobis distance be diagonal for the sake of cost reduction in memory and time of calculation. We propose that the diagonal elements be given in terms of the variations of the feature vector components. Geometrically, this prescription tends to redistribute the set of data in the shape of a hypersphere in the feature vector space. The idea is applied to the speech recognition by hidden Markov model with fuzzy vector quantization. The result shows that the recognition is improved by an appropriate choice of the relevant adjustable parameter. The Viterbi score difference of the two winners in the recognition test shows that the general behavior is in accord with that of the recognition error rate.
PDF

Sensitivity Property of Generalized CMAC Neural Network

Kim, Dong-Hyawn;Lee, In-Won
- Computational Structural Engineering : An International Journal
- /
- v.3 no.1
- /
- pp.39-47
- /
- 2003
Generalized CMAC (GCMAC) is a type of neural network known to be fast in learning. The network may be useful in structural engineering applications such as the identification and the control of structures. The derivatives of a trained GCMAC is relatively poor in accuracy. Therefore to improve the accuracy, a new algorithm is proposed. If GCMAC is directly differentiated, the accuracy of the derivative is not satisfactory. This is due to the quantization of input space and the shape of basis function used. Using the periodicity of the predicted output by GCMAC, the derivative can be improved to the extent of having almost no error. Numerical examples are considered to show the accuracy of the proposed algorithm.
PDF

Equal Bit Rate Control for Low Bit-Rate Coder by Using Frame Statistics (확률 분포를 고려한 저 전송률 비디오 부호기의 균등 비트 할당 기법 연구)

한성욱;서동완;최윤식
- Proceedings of the IEEK Conference
- /
- 2002.06d
- /
- pp.29-32
- /
- 2002
In typical block-based video coding, the objective of RC(Rate Control) is to select the quantization parameters so that the encoder produces bits at the rate of the channel and the overall distortion is minimized. To reduce the huge amount of computations required for offline RC, there have been significant efforts to speed up the process of video encoders. Those efforts have been mainly focused on the modes for bit rate and distortion in types of coders, in terms of the quantization parameters. Because previous works related to model based online RC are based on statistics of previous frame, it occurs the problem such that allocates bits unequally without regard to current frame statistics. In this thesis, an equal bit allocation scheme using current frame statistics is proposed.
PDF

An Adaptive Algorithm for the Quantization Step Size Control of MPEG-2

Cho, Nam-Ik
- Journal of Electrical Engineering and information Science
- /
- v.2 no.6
- /
- pp.138-145
- /
- 1997
This paper proposes an adaptive algorithm for the quantization step size control of MPEG-2, using the information obtained from the previously encoded picture. Before quantizing the DCT coefficients, the properties of reconstruction error of each macro block (MB) is predicted from the previous frame. For the prediction of the error of current MB, a block with the size of MB in the previous frame are chosen by use of the motion vector. Since the original and reconstructed images of the previous frame are available in the encoder, we can calculate the reconstruction error of this block. This error is considered as the expected error of the current MB if it is quantized with the same step size and bit rate. Comparing the error of the MB with the average of overall MBs, if it is larger than the average, small step size is given for this MB, and vice versa. As a result, the error distribution of the MB is more concentrated to the average, giving low variance and improved image quality. Especially for the low bit application, the proposed algorithm gives much smaller error variance and higher PSNR compared to TM5 (test model 5).
PDF

Vector Quantization of Image Signal using Larning Count Control Neural Networks (학습 횟수 조절 신경 회로망을 이용한 영상 신호의 벡터 양자화)

유대현;남기곤;윤태훈;김재창
- Journal of the Korean Institute of Telematics and Electronics C
- /
- v.34C no.1
- /
- pp.42-50
- /
- 1997
Vector quantization has shown to be useful for compressing data related with a wide rnage of applications such as image processing, speech processing, and weather satellite. Neural networks of images this paper propses a efficient neural network learning algorithm, called learning count control algorithm based on the frquency sensitive learning algorithm. This algorithm can train a results more codewords can be assigned to the sensitive region of the human visual system and the quality of the reconstructed imate can be improved. We use a human visual systrem model that is a cascade of a nonlinear intensity mapping function and a modulation transfer function with a bandpass characteristic.
PDF

The Optimal Thresholding Technique for an Efficient Quadtree Segmentation (효율적인 Quadtree 분할을 위한 최적의 임계값 설정 기술)

Lee, Hang-Chan
- The Transactions of the Korean Institute of Electrical Engineers A
- /
- v.48 no.8
- /
- pp.1031-1036
- /
- 1999
A Hierarchical vector Quantization scheme is implemented and an optimal thresholding technique of quadtree segmentation for performing high quality low bit rate image compression is proposes. A mathematical model is constructed under the assumption that the standard deviations of sub-blocks are larger than or equal to the standard deviation of the upper level block which is generated by merging of sub-blocks. This thresholding technique based on the mathematical modeling allows producing about 1 dB improved performance in terms of PSNR at most ranges of bit rates over the quadtree coder, which is based on MSE for quadtree segmentation.
PDF

A Wavelet Approach to Broadcast Video Traffic Modeling (Wavelet 변환을 이용한 영상 트래픽 모델링)

정수환;배명진;박성준
- The Journal of the Acoustical Society of Korea
- /
- v.18 no.1
- /
- pp.72-77
- /
- 1999
In this paper, we propose a wavelet VQ approach to modeling VBR broadcast video traffic. The proposed method decomposes video traffic into two parts via wavelet transformation, and models each part separately. The first part, which is modeled by an AR(1) process, serves to capture the long-term trend of the traffic; the second part, classified via vector quantization, addresses the short-term behavior of the traffic. Compared with other VBR video models, our model has three advantages. First, it allows the separate modeling of long- and short-term behavior of the video traffic; second, it preserves the periodic coding structure in traffic data; and third, it provides an unified approach for the frameand slice-level traffic modeling. We demonstrate the validity of our model by statistical measurements and network performance simulation.
PDF

Search Result 227, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)