• Title/Summary/Keyword: dictionary-based

Search Result 552, Processing Time 0.029 seconds

Overlapping Sound Event Detection Using NMF with K-SVD Based Dictionary Learning (K-SVD 기반 사전 훈련과 비음수 행렬 분해 기법을 이용한 중첩음향이벤트 검출)

  • Choi, Hyeonsik;Keum, Minseok;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.3
    • /
    • pp.234-239
    • /
    • 2015
  • Non-Negative Matrix Factorization (NMF) is a method for updating dictionary and gain in alternating manner. Due to ease of implementation and intuitive interpretation, NMF is widely used to detect and separate overlapping sound events. However, NMF that utilizes non-negativity constraints generates parts-based representation and this distinct property leads to a dictionary containing fragmented acoustic events. As a result, the presence of shared basis results in performance degradation in both separation and detection tasks of overlapping sound events. In this paper, we propose a new method that utilizes K-Singular Value Decomposition (K-SVD) based dictionary to address and mitigate the part-based representation issue during the dictionary learning step. Subsequently, we calculate the gain using NMF in sound event detection step. We evaluate and confirm that overlapping sound event detection performance of the proposed method is better than the conventional method that utilizes NMF based dictionary.

Anomaly Detection via Pattern Dictionary Method and Atypicality in Application (패턴사전과 비정형성을 통한 이상치 탐지방법 적용)

  • Sehong Oh;Jongsung Park;Youngsam Yoon
    • Journal of Sensor Science and Technology
    • /
    • v.32 no.6
    • /
    • pp.481-486
    • /
    • 2023
  • Anomaly detection holds paramount significance across diverse fields, encompassing fraud detection, risk mitigation, and sensor evaluation tests. Its pertinence extends notably to the military, particularly within the Warrior Platform, a comprehensive combat equipment system with wearable sensors. Hence, we propose a data-compression-based anomaly detection approach tailored to unlabeled time series and sequence data. This method entailed the construction of two distinctive features, typicality and atypicality, to discern anomalies effectively. The typicality of a test sequence was determined by evaluating the compression efficacy achieved through the pattern dictionary. This dictionary was established based on the frequency of all patterns identified in a training sequence generated for each sensor within Warrior Platform. The resulting typicality served as an anomaly score, facilitating the identification of anomalous data using a predetermined threshold. To improve the performance of the pattern dictionary method, we leveraged atypicality to discern sequences that could undergo compression independently without relying on the pattern dictionary. Consequently, our refined approach integrated both typicality and atypicality, augmenting the effectiveness of the pattern dictionary method. Our proposed method exhibited heightened capability in detecting a spectrum of unpredictable anomalies, fortifying the stability of wearable sensors prevalent in military equipment, including the Army TIGER 4.0 system.

Automatic Construction of Korean Unknown Word Dictionary using Occurrence Frequency in Web Documents (웹문서에서의 출현빈도를 이용한 한국어 미등록어 사전 자동 구축)

  • Park, So-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.3
    • /
    • pp.27-33
    • /
    • 2008
  • In this paper, we propose a method of automatically constructing a dictionary by extracting unknown words from given eojeols in order to improve the performance of a Korean morphological analyzer. The proposed method is composed of a dictionary construction phase based on full text analysis and a dictionary construction phase based on web document frequency. The first phase recognizes unknown words from strings repeatedly occurred in a given full text while the second phase recognizes unknown words based on frequency of retrieving each string, once occurred in the text, from web documents. Experimental results show that the proposed method improves 32.39% recall by utilizing web document frequency compared with a previous method.

  • PDF

Double 𝑙1 regularization for moving force identification using response spectrum-based weighted dictionary

  • Yuandong Lei;Bohao Xu;Ling Yu
    • Structural Engineering and Mechanics
    • /
    • v.91 no.2
    • /
    • pp.227-238
    • /
    • 2024
  • Sparse regularization methods have proven effective in addressing the ill-posed equations encountered in moving force identification (MFI). However, the complexity of vehicle loads is often ignored in existing studies aiming at enhancing MFI accuracy. To tackle this issue, a double 𝑙1 regularization method is proposed for MFI based on a response spectrum-based weighted dictionary in this study. Firstly, the relationship between vehicle-induced responses and moving vehicle loads (MVL) is established. The structural responses are then expanded in the frequency domain to obtain the prior knowledge related to MVL and to further construct a response spectrum-based weighted dictionary for MFI with a higher accuracy. Secondly, with the utilization of this weighted dictionary, a double 𝑙1 regularization framework is presented for identifying the static and dynamic components of MVL by the alternating direction method of multipliers (ADMM) method successively. To assess the performance of the proposed method, two different types of MVL, such as composed of trigonometric functions and driven from a 1/4 bridge-vehicle model, are adopted to conduct numerical simulations. Furthermore, a series of MFI experimental verifications are carried out in laboratory. The results shows that the proposed method's higher accuracy and strong robustness to noises compared with other traditional regularization methods.

Optimizing Multiple Pronunciation Dictionary Based on a Confusability Measure for Non-native Speech Recognition (타언어권 화자 음성 인식을 위한 혼잡도에 기반한 다중발음사전의 최적화 기법)

  • Kim, Min-A;Oh, Yoo-Rhee;Kim, Hong-Kook;Lee, Yeon-Woo;Cho, Sung-Eui;Lee, Seong-Ro
    • MALSORI
    • /
    • no.65
    • /
    • pp.93-103
    • /
    • 2008
  • In this paper, we propose a method for optimizing a multiple pronunciation dictionary used for modeling pronunciation variations of non-native speech. The proposed method removes some confusable pronunciation variants in the dictionary, resulting in a reduced dictionary size and less decoding time for automatic speech recognition (ASR). To this end, a confusability measure is first defined based on the Levenshtein distance between two different pronunciation variants. Then, the number of phonemes for each pronunciation variant is incorporated into the confusability measure to compensate for ASR errors due to words of a shorter length. We investigate the effect of the proposed method on ASR performance, where Korean is selected as the target language and Korean utterances spoken by Chinese native speakers are considered as non-native speech. It is shown from the experiments that an ASR system using the multiple pronunciation dictionary optimized by the proposed method can provide a relative average word error rate reduction of 6.25%, with 11.67% less ASR decoding time, as compared with that using a multiple pronunciation dictionary without the optimization.

  • PDF

A Mobile Dictionary based on a Prefetching Method (선인출 기반의 모바일 사전)

  • Hong, Soon-Jung;Moon, Yang-Sae;Kim, Hea-Suk;Kim, Jin-Ho;Chung, Young-Jun
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.3
    • /
    • pp.197-206
    • /
    • 2008
  • In the mobile Internet environment, frequent communications between a mobile device and a content server are required for searching or downloading learning materials. In this paper, we propose an efficient prefetching technique to reduce the network cost and to improve the communication efficiency in the mobile dictionary. Our prefetching-based approach can be explained as follows. First, we propose an overall framework for the prefetching-based mobile dictionary. Second, we present a systematic way of determining the amount of prefetching data for each of packet-based and flat-rate billing cases. Third, by focusing on the English-Korean mobile dictionary for middle or high school students, we propose an intuitive method of determining the words to be prefetched in advance. Fourth, based on these determination methods, we propose an efficient prefetching algorithm. Fifth, through experiments, we show the superiority of our prefetching-based method. From this approach, we can summarize major contributions as follows. First, to our best knowledge, this is the first attempt to exploit prefetching techniques in mobile applications. Second, we propose a systematic way of applying prefetching techniques to a mobile dictionary. Third, using prefetching techniques we improve the overall performance of a network-based mobile dictionary. Experimental results show that, compared with the traditional on-demand approach, our prefetching based approach improves the average performance by $9.8%{\sim}33.2%$. These results indicate that our framework can be widely used not only in the mobile dictionary but also in other mobile Internet applications that require the prefetching technique.

Ternary Decomposition and Dictionary Extension for Khmer Word Segmentation

  • Sung, Thaileang;Hwang, Insoo
    • Journal of Information Technology Applications and Management
    • /
    • v.23 no.2
    • /
    • pp.11-28
    • /
    • 2016
  • In this paper, we proposed a dictionary extension and a ternary decomposition technique to improve the effectiveness of Khmer word segmentation. Most word segmentation approaches depend on a dictionary. However, the dictionary being used is not fully reliable and cannot cover all the words of the Khmer language. This causes an issue of unknown words or out-of-vocabulary words. Our approach is to extend the original dictionary to be more reliable with new words. In addition, we use ternary decomposition for the segmentation process. In this research, we also introduced the invisible space of the Khmer Unicode (char\u200B) in order to segment our training corpus. With our segmentation algorithm, based on ternary decomposition and invisible space, we can extract new words from our training text and then input the new words into the dictionary. We used an extended wordlist and a segmentation algorithm regardless of the invisible space to test an unannotated text. Our results remarkably outperformed other approaches. We have achieved 88.8%, 91.8% and 90.6% rates of precision, recall and F-measurement.

KNE: An Automatic Dictionary Expansion Method Using Use-cases for Morphological Analysis

  • Nam, Chung-Hyeon;Jang, Kyung-Sik
    • Journal of information and communication convergence engineering
    • /
    • v.17 no.3
    • /
    • pp.191-197
    • /
    • 2019
  • Morphological analysis is used for searching sentences and understanding context. As most morpheme analysis methods are based on predefined dictionaries, the problem of a target word not being registered in the given morpheme dictionary, the so-called unregistered word problem, can be a major cause of reduced performance. The current practical solution of such unregistered word problem is to add them by hand-write into the given dictionary. This method is a limitation that restricts the scalability and expandability of dictionaries. In order to overcome this limitation, we propose a novel method to automatically expand a dictionary by means of use-case analysis, which checks the validity of the unregistered word by exploring the use-cases through web crawling. The results show that the proposed method is a feasible one in terms of the accuracy of the validation process, the expandability of the dictionary and, after registration, the fast extraction time of morphemes.

Dictionary Learning based Superresolution on 4D Light Field Images (4차원 Light Field 영상에서 Dictionary Learning 기반 초해상도 알고리즘)

  • Lee, Seung-Jae;Park, In Kyu
    • Journal of Broadcast Engineering
    • /
    • v.20 no.5
    • /
    • pp.676-686
    • /
    • 2015
  • A 4D light field image is represented in traditional 2D spatial domain and additional 2D angular domain. The 4D light field has a resolution limitation both in spatial and angular domains since 4D signals are captured by 2D CMOS sensor with limited resolution. In this paper, we propose a dictionary learning-based superresolution algorithm in 4D light field domain to overcome the resolution limitation. The proposed algorithm performs dictionary learning using a large number of extracted 4D light field patches. Then, a high resolution light field image is reconstructed from a low resolution input using the learned dictionary. In this paper, we reconstruct a 4D light field image to have double resolution both in spatial and angular domains. Experimental result shows that the proposed method outperforms the traditional method for the test images captured by a commercial light field camera, i.e. Lytro.

The Construction of a Domain-Specific Sentiment Dictionary Using Graph-based Semi-supervised Learning Method (그래프 기반 준지도 학습 방법을 이용한 특정분야 감성사전 구축)

  • Kim, Jung-Ho;Oh, Yean-Ju;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.18 no.1
    • /
    • pp.103-110
    • /
    • 2015
  • Sentiment lexicon is an essential element for expressing sentiment on a text or recognizing sentiment from a text. We propose a graph-based semi-supervised learning method to construct a sentiment dictionary as sentiment lexicon set. In particular, we focus on the construction of domain-specific sentiment dictionary. The proposed method makes up a graph according to lexicons and proximity among lexicons, and sentiments of some lexicons which already know their sentiment values are propagated throughout all of the lexicons on the graph. There are two typical types of the sentiment lexicon, sentiment words and sentiment phrase, and we construct a sentiment dictionary by creating each graph of them and infer sentiment of all sentiment lexicons. In order to verify our proposed method, we constructed a sentiment dictionary specific to the movie domain, and conducted sentiment classification experiments with it. As a result, it have been shown that the classification performance using the sentiment dictionary is better than the other using typical general-purpose sentiment dictionary.