• Title/Summary/Keyword: data weighting

Search Result 646, Processing Time 0.026 seconds

Language Model Adaptation for Conversational Speech Recognition (대화체 연속음성 인식을 위한 언어모델 적응)

  • Park Young-Hee;Chung Minhwa
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.83-86
    • /
    • 2003
  • This paper presents our style-based language model adaptation for Korean conversational speech recognition. Korean conversational speech is observed various characteristics of content and style such as filled pauses, word omission, and contraction as compared with the written text corpora. For style-based language model adaptation, we report two approaches. Our approaches focus on improving the estimation of domain-dependent n-gram models by relevance weighting out-of-domain text data, where style is represented by n-gram based tf*idf similarity. In addition to relevance weighting, we use disfluencies as predictor to the neighboring words. The best result reduces 6.5% word error rate absolutely and shows that n-gram based relevance weighting reflects style difference greatly and disfluencies are good predictor.

  • PDF

Optimization of Fuzzy Systems by Means of GA and Weighting Factor (유전자 알고리즘과 하중값을 이용한 퍼지 시스템의 최적화)

  • Park, Byoung-Jun;Oh, Sung-Kwun;Ahn, Tae-Chon;Kim, Hyun-Ki
    • The Transactions of the Korean Institute of Electrical Engineers A
    • /
    • v.48 no.6
    • /
    • pp.789-799
    • /
    • 1999
  • In this paper, the optimization of fuzzy inference systems is proposed for fuzzy model of nonlinear systems. A fuzzy model needs to be identified and optimized by means of the definite and systematic methods, because a fuzzy model is primarily acquired by expert's experience. The proposed rule-based fuzzy model implements system structure and parameter identification using the HCM(Hard C-mean) clustering method, genetic algorithms and fuzzy inference method. Two types of inference methods of a fuzzy model are the simplified inference and linear inference. in this paper, nonlinear systems are expressed using the identification of structure such as input variables and the division of fuzzy input subspaces, and the identification of parameters of a fuzzy model. To identify premise parameters of fuzzy model, the genetic algorithms is used and the standard least square method with the gaussian elimination method is utilized for the identification of optimum consequence parameters of fuzzy model. Also, the performance index with weighting factor is proposed to achieve a balance between the performance results of fuzzy model produced for the training and testing data set, and it leads to enhance approximation and predictive performance of fuzzy system. Time series data for gas furnace and sewage treatment process are used to evaluate the performance of the proposed model.

  • PDF

Feature Weighting in Projected Clustering for High Dimensional Data (고차원 데이타에 대한 투영 클러스터링에서 특성 가중치 부여)

  • Park, Jong-Soo
    • Journal of KIISE:Databases
    • /
    • v.32 no.3
    • /
    • pp.228-242
    • /
    • 2005
  • The projected clustering seeks to find clusters in different subspaces within a high dimensional dataset. We propose an algorithm to discover near optimal projected clusters without user specified parameters such as the number of output clusters and the average cardinality of subspaces of projected clusters. The objective function of the algorithm computes projected energy, quality, and the number of outliers in each process of clustering. In order to minimize the projected energy and to maximize the quality in clustering, we start to find best subspace of each cluster on the density of input points by comparing standard deviations of the full dimension. The weighting factor for each dimension of the subspace is used to get id of probable error in measuring projected distances. Our extensive experiments show that our algorithm discovers projected clusters accurately and it is scalable to large volume of data sets.

A Data-Mining-based Methodology for Military Occupational Specialty Assignment (데이터 마이닝 기반의 군사특기 분류 방법론 연구)

  • 민규식;정지원;최인찬
    • Journal of the military operations research society of Korea
    • /
    • v.30 no.1
    • /
    • pp.1-14
    • /
    • 2004
  • In this paper, we propose a new data-mining-based methodology for military occupational specialty assignment. The proposed methodology consists of two phases, feature selection and man-power assignment. In the first phase, the k-means partitioning algorithm and the optimal variable weighting algorithm are used to determine attribute weights. We address limitations of the optimal variable weighting algorithm and suggest a quadratic programming model that can handle categorical variables and non-contributory trivial variables. In the second phase, we present an integer programming model to deal with a man-power assignment problem. In the model, constraints on demand-supply requirements and training capacity are considered. Moreover, the attribute weights obtained in the first phase for each specialty are used to measure dissimilarity. Results of a computational experiment using real-world data are provided along with some analysis.

A Method for Short Text Classification using SNS Feature Information based on Markov Logic Networks (SNS 특징정보를 활용한 마르코프 논리 네트워크 기반의 단문 텍스트 분류 방법)

  • Lee, Eunji;Kim, Pankoo
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.7
    • /
    • pp.1065-1072
    • /
    • 2017
  • As smart devices and social network services (SNSs) become increasingly pervasive, individuals produce large amounts of data in real time. Accordingly, studies on unstructured data analysis are actively being conducted to solve the resultant problem of information overload and to facilitate effective data processing. Many such studies are conducted for filtering inappropriate information. In this paper, a feature-weighting method considering SNS-message features is proposed for the classification of short text messages generated on SNSs, using Markov logic networks for category inference. The performance of the proposed method is verified through a comparison with an existing frequency-based classification methods.

A Data Hiding Method of Binary Images Using Pixel-value Weighting (이진 이미지에 대한 픽셀값 가중치를 이용한 자료 은닉 기법 연구)

  • Jung, Ki-Hyun
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.11 no.4
    • /
    • pp.68-75
    • /
    • 2008
  • This paper proposes a new data hiding method for binary images using the weighting value of pixel-value differencing. The binary cover image is partitioned into non-overlapping sub-blocks and find the most suitable position to embed a secret bit for each sub-block. The proposed method calculates the weighted value for a sub-block to pivot a pixel to be changed. This improves the image quality of the stego-image. The experimental results show that the proposed method achieves a good visual quality and high capacity.

Kalman-Filter Estimation and Prediction for a Spatial Time Series Model (공간시계열 모형의 칼만필터 추정과 예측)

  • Lee, Sung-Duck;Han, Eun-Hee;Kim, Duck-Ki
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.1
    • /
    • pp.79-87
    • /
    • 2011
  • A spatial time series model was used for analyzing the method of spatial time series (not the ARIMA model that is popular for analyzing spatial time series) by using chicken pox data which is a highly contagious disease and grid data due to ARIMA not reflecting the spatial processes. Time series model contains a weighting matrix, because that spatial time series model influences the time variation as well as the spatial location. The weighting matrix reflects that the more geographically contiguous region has the higher spatial dependence. It is hypothesized that the weighting matrix gives neighboring areas the same influence in the study of the spatial time series model. Therefore, we try to present the conclusion with a weighting matrix in a way that gives the same weight to existing neighboring areas in the study of the suitability of the STARMA model, spatial time series model and STBL model, in the comparative study of the predictive power for statistical inference, and the results. Furthermore, through the Kalman-Filter method we try to show the superiority of the Kalman-Filter method through a parameter assumption and the processes of prediction.

Highlight based Lyrics Search Considering the Characteristics of Query (사용자 질의어 특징을 반영한 하이라이트 기반 노래 가사 검색)

  • Kim, Kweon Yang
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.4
    • /
    • pp.301-307
    • /
    • 2016
  • This paper proposes a lyric search method to consider the characteristics of the user query. According to the fact that queries for the lyric search are derived from highlight parts of the music, this paper uses the hierarchical agglomerative clustering to find the highlight and proposes a Gaussian weighting to consider the neighbor of the highlight as well as highlight. By setting the mean of a Gaussian weighting at the highlight, this weighting function has higher weights near the highlight and the lower weights far from the highlight. Then, this paper constructs a index of lyrics with the gaussian weighting. According to the experimental results on a data set obtained from 5 real users, the proposed method is proved to be effective.

Hybrid Method using Frame Selection and Weighting Model Rank to improve Performance of Real-time Text-Independent Speaker Recognition System based on GMM (GMM 기반 실시간 문맥독립화자식별시스템의 성능향상을 위한 프레임선택 및 가중치를 이용한 Hybrid 방법)

  • 김민정;석수영;김광수;정호열;정현열
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.5
    • /
    • pp.512-522
    • /
    • 2002
  • In this paper, we propose a hybrid method which is mixed with frame selection and weighting model rank method, based on GMM(gaussian mixture model), for real-time text-independent speaker recognition system. In the system, maximum likelihood estimation was used for GMM parameter optimization, and maximum likelihood was used for recognition basically Proposed hybrid method has two steps. First, likelihood score was calculated with speaker models and test data at frame level, and the difference is calculated between the biggest likelihood value and second. And then, the frame is selected if the difference is bigger than threshold. The second, instead of calculated likelihood, weighting value is used for calculating total score at each selected frame. Cepstrum coefficient and regressive coefficient were used as feature parameters, and the database for test and training consists of several data which are collected at different time, and data for experience are selected randomly In experiments, we applied each method to baseline system, and tested. In speaker recognition experiments, proposed hybrid method has an average of 4% higher recognition accuracy than frame selection method and 1% higher than W method, implying the effectiveness of it.

  • PDF

Multimodal Media Content Classification using Keyword Weighting for Recommendation (추천을 위한 키워드 가중치를 이용한 멀티모달 미디어 콘텐츠 분류)

  • Kang, Ji-Soo;Baek, Ji-Won;Chung, Kyungyong
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.5
    • /
    • pp.1-6
    • /
    • 2019
  • As the mobile market expands, a variety of platforms are available to provide multimodal media content. Multimodal media content contains heterogeneous data, accordingly, user requires much time and effort to select preferred content. Therefore, in this paper we propose multimodal media content classification using keyword weighting for recommendation. The proposed method extracts keyword that best represent contents through keyword weighting in text data of multimodal media contents. Based on the extracted data, genre class with subclass are generated and classify appropriate multimodal media contents. In addition, the user's preference evaluation is performed for personalized recommendation, and multimodal content is recommended based on the result of the user's content preference analysis. The performance evaluation verifies that it is superiority of recommendation results through the accuracy and satisfaction. The recommendation accuracy is 74.62% and the satisfaction rate is 69.1%, because it is recommended considering the user's favorite the keyword as well as the genre.