• Title/Summary/Keyword: second quantization

Search Result 76, Processing Time 0.022 seconds

A Common Bitmap Block Truncation Coding for Color Images Based on Binary Ant Colony Optimization

  • Li, Zhihong;Jin, Qiang;Chang, Chin-Chen;Liu, Li;Wang, Anhong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.5
    • /
    • pp.2326-2345
    • /
    • 2016
  • For the compression of color images, a common bitmap usually is generated to replace the three individual bitmaps that originate from block truncation coding (BTC) of the R, G and B channels. However, common bitmaps generated by some traditional schemes are not the best possible because they do not consider the minimized distortion of the entire color image. In this paper, we propose a near-optimized common bitmap scheme for BTC using Binary Ant Colony Optimization (BACO), producing a BACO-BTC scheme. First, the color image is compressed by the BTC algorithm to get three individual bitmaps, and three pairs of quantization values for the R, G, and B channels. Second, a near-optimized common bitmap is generated with minimized distortion of the entire color image based on the idea of BACO. Finally, the color image is reconstructed easily by the corresponding quantization values according to the common bitmap. The experimental results confirmed that reconstructed image of the proposed scheme has better visual quality and less computational complexity than the referenced schemes.

A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments (네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계)

  • Lee, Gil-Ho;Yoon, Jae-Sam;Oh, Yoo-Rhee;Kim, Hong-Kook
    • MALSORI
    • /
    • no.54
    • /
    • pp.27-43
    • /
    • 2005
  • Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.

  • PDF

A Query-by-Speech Scheme for Photo Albuming (음성 질의 기반 디지털 사진 검색 기법)

  • Kim Tae-Sung;Suh Young-Joo;Lee Yong-Ju;Kim Hoi-Rin
    • MALSORI
    • /
    • no.57
    • /
    • pp.99-112
    • /
    • 2006
  • In this paper, we introduce two retrieval methods for photos with speech documents. We compare the pattern of speech query with those of speech documents recorded in digital cameras, and measure the similarities, and retrieve photos corresponding to the speech documents which have high similarity scores. As the first approach, a phoneme recognition scheme is used as the pre-processor for the pattern matching, and in the second one, the vector quantization (VQ) and the dynamic time warping (DTW) are applied to match the speech query with the documents in signal domain itself. Experimental results show that the performance of the first approach is highly dependent on that of phoneme recognition while the processing time is short. The second method provides a great improvement of performance. While the processing time is longer than that of the first method due to DTW, but we can reduce it by taking approximated methods.

  • PDF

An Embedding /Extracting Method of Audio Watermark Information for High Quality Stereo Music (고품질 스테레오 음악을 위한 오디오 워터마크 정보 삽입/추출 기술)

  • Bae, Kyungyul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.21-35
    • /
    • 2018
  • Since the introduction of MP3 players, CD recordings have gradually been vanishing, and the music consuming environment of music users is shifting to mobile devices. The introduction of smart devices has increased the utilization of music through music playback, mass storage, and search functions that are integrated into smartphones and tablets. At the time of initial MP3 player supply, the bitrate of the compressed music contents generally was 128 Kbps. However, as increasing of the demand for high quality music, sound quality of 384 Kbps appeared. Recently, music content of FLAC (Free License Audio Codec) format using lossless compression method is becoming popular. The download service of many music sites in Korea has classified by unlimited download with technical protection and limited download without technical protection. Digital Rights Management (DRM) technology is used as a technical protection measure for unlimited download, but it can only be used with authenticated devices that have DRM installed. Even if music purchased by the user, it cannot be used by other devices. On the contrary, in the case of music that is limited in quantity but not technically protected, there is no way to enforce anyone who distributes it, and in the case of high quality music such as FLAC, the loss is greater. In this paper, the author proposes an audio watermarking technology for copyright protection of high quality stereo music. Two kinds of information, "Copyright" and "Copy_free", are generated by using the turbo code. The two watermarks are composed of 9 bytes (72 bits). If turbo code is applied for error correction, the amount of information to be inserted as 222 bits increases. The 222-bit watermark was expanded to 1024 bits to be robust against additional errors and finally used as a watermark to insert into stereo music. Turbo code is a way to recover raw data if the damaged amount is less than 15% even if part of the code is damaged due to attack of watermarked content. It can be extended to 1024 bits or it can find 222 bits from some damaged contents by increasing the probability, the watermark itself has made it more resistant to attack. The proposed algorithm uses quantization in DCT so that watermark can be detected efficiently and SNR can be improved when stereo music is converted into mono. As a result, on average SNR exceeded 40dB, resulting in sound quality improvements of over 10dB over traditional quantization methods. This is a very significant result because it means relatively 10 times improvement in sound quality. In addition, the sample length required for extracting the watermark can be extracted sufficiently if the length is shorter than 1 second, and the watermark can be completely extracted from music samples of less than one second in all of the MP3 compression having a bit rate of 128 Kbps. The conventional quantization method can extract the watermark with a length of only 1/10 compared to the case where the sampling of the 10-second length largely fails to extract the watermark. In this study, since the length of the watermark embedded into music is 72 bits, it provides sufficient capacity to embed necessary information for music. It is enough bits to identify the music distributed all over the world. 272 can identify $4*10^{21}$, so it can be used as an identifier and it can be used for copyright protection of high quality music service. The proposed algorithm can be used not only for high quality audio but also for development of watermarking algorithm in multimedia such as UHD (Ultra High Definition) TV and high-resolution image. In addition, with the development of digital devices, users are demanding high quality music in the music industry, and artificial intelligence assistant is coming along with high quality music and streaming service. The results of this study can be used to protect the rights of copyright holders in these industries.

The Efficient Feature Extraction of Handwritten Numerals in GLVQ Clustering Network (GLVQ클러스터링을 위한 필기체 숫자의 효율적인 특징 추출 방법)

  • Jeon, Jong-Won;Min, Jun-Yeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.6
    • /
    • pp.995-1001
    • /
    • 1995
  • The structure of a typical pattern recognition consists a pre-processing, a feature extraction(algorithm) and classification or recognition. In classification, when widely varying patterns exist in same category, we need the clustering which organize the similar patterns. Clustering algorithm is two approaches. Firs, statistical approaches which are k-means, ISODATA algorithm. Second, neural network approach which is T. Kohonen's LVQ(Learning Vector Quantization). Nikhil R. Palet al proposed the GLVQ(Generalized LVQ, 1993). This paper suggest the efficient feature extraction methods of handwritten numerals in GLVQ clustering network. We use the handwritten numeral data from 21's authors(ie, 200 patterns) and compare the proportion of misclassified patterns for each feature extraction methods. As results, when we use the projection combination method, the classification ratio is 98.5%.

  • PDF

Wideband Multi-bit Continuous-Time $\Sigma\Delta$ Modulator with Adaptive Quantization Level (적응성 양자화 레벨을 가지는 광대역 다중-비트 연속시간 $\Sigma\Delta$ 모듈레이터)

  • Lee, Hee-Bum;Shin, Woo-Yeol;Lee, Hyun-Joong;Kim, Suh-Wan
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.44 no.11
    • /
    • pp.1-8
    • /
    • 2007
  • A wideband continuous-time sigma delta modulator for wireless application is implemented in 130nm CMOS. The SNR for small input signal is improved using a proposed adaptive quantizer which can effectively scale the quantization level. The modulator comprises a second-order loop filter for low power consumption, 4-bit quantizer and DAC for low jitter sensitivity and high linearity. Designed circuit achieves peak SNR of 51.36B with 10MHz signal Bandwidth and 320MHz sampling frequency dissipating 30mW.

Wavelet-Based Image Compression Using the Properties of Subbands (대역의 특성을 이용한 웨이블렛 기반 영상 압축 부호화)

  • 박성완;강의성;문동영;고성제
    • Journal of Broadcast Engineering
    • /
    • v.1 no.2
    • /
    • pp.118-132
    • /
    • 1996
  • This paper proposes a wavelet transform- based image compression method using the energy distribution. The proposed method Involves two steps. First, we use a wavelet transform for the subband decomposition. The original image Is decomposed into one low resolution subimage and three high frequency subimages. Each high frequency subimages have horizontal, vertical, and diagonal directional edges. The wavelet transform is luther applied to these high frequency subimages. Resultant transformed subimages have different energy distributions corresponding to different orientation of the high pass filter. Second, for higer compression ratio and computational effciency, we discard some subimages with small energy. The remaining subimages are encoded using either DPCM or quantization followed by entropy coding. Experimental results show that the proposed coding scheme has better performance in the peak signal to noise ratio(PSNR) and higher compression ratio than conventional image coding method using the wavelet transform followed by the straightforward vector quantization.

  • PDF

Feed-forward Learning Algorithm by Generalized Clustering Network (Generalized Clustering Network를 이용한 전방향 학습 알고리즘)

  • Min, Jun-Yeong;Jo, Hyeong-Gi
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.5
    • /
    • pp.619-625
    • /
    • 1995
  • This paper constructs a feed-forward learning complex algorithm which replaced by the backpropagation learning. This algorithm first attempts to organize the pattern vectors into clusters by Generalized Learning Vector Quantization(GLVQ) clustering algorithm(Nikhil R. Pal et al, 1993), second, regroup the pattern vectors belonging to different clusters, and the last, recognize into regrouping pattern vectors by single layer perceptron. Because this algorithm is feed-forward learning algorithm, time is less than backpropagation algorithm and the recognition rate is increased. We use 250 ASCII code bit patterns that is normalized to 16$\times$8. As experimental results, when 250 patterns devide by 10 clusters, average iteration of each cluster is 94.7, and recognition rate is 100%.

  • PDF

A Study on New Hierarchical Motion Compensation Pyramid Coding (새로운 계층적 이동 보상 피라미드 부호화 방식 연구)

  • 전준현
    • Journal of Broadcast Engineering
    • /
    • v.8 no.2
    • /
    • pp.181-197
    • /
    • 2003
  • Notion Compensation(MC) technique using Sub-Band Coding with the hierarchical structure is efficient to estimate real motion. In the hierarchical pyramid method, low-band MC pyramid method is popular, where the upper layer estimate the glover motion and next lower layer estimate the local motion. The low-band MC pyramid scheme has two problems. First, because the quantization errors at lower layer are accumulated when using coding and quantizing, it is impossible to search the exact Motion Vector(MV) Second, because of the top-down search problem in the hierarchical structure, MV mismatch in upper layer causes serious MV in lower layer So. we propose new hierarchical MC pyramid method based on edge classification. In this Paper, we show that the performance of proposed Pass-band motion compensation pyramid technique is better than low-band motion compensation pyramid. Also, in the pyramid motion estimation, we propose initial MV estimation scheme based on the edge-pattern classification. As a result, we find that PSNR was increased.

Light weight architecture for acoustic scene classification (음향 장면 분류를 위한 경량화 모형 연구)

  • Lim, Soyoung;Kwak, Il-Youp
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.979-993
    • /
    • 2021
  • Acoustic scene classification (ASC) categorizes an audio file based on the environment in which it has been recorded. This has long been studied in the detection and classification of acoustic scenes and events (DCASE). In this study, we considered the problem that ASC faces in real-world applications that the model used should have low-complexity. We compared several models that apply light-weight techniques. First, a base CNN model was proposed using log mel-spectrogram, deltas, and delta-deltas features. Second, depthwise separable convolution, linear bottleneck inverted residual block was applied to the convolutional layer, and Quantization was applied to the models to develop a low-complexity model. The model considering low-complexity was similar or slightly inferior to the performance of the base model, but the model size was significantly reduced from 503 KB to 42.76 KB.