• Title/Summary/Keyword: 데이터 필터 기법

Search Result 602, Processing Time 0.029 seconds

How to improve the accuracy of recommendation systems: Combining ratings and review texts sentiment scores (평점과 리뷰 텍스트 감성분석을 결합한 추천시스템 향상 방안 연구)

  • Hyun, Jiyeon;Ryu, Sangyi;Lee, Sang-Yong Tom
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.219-239
    • /
    • 2019
  • As the importance of providing customized services to individuals becomes important, researches on personalized recommendation systems are constantly being carried out. Collaborative filtering is one of the most popular systems in academia and industry. However, there exists limitation in a sense that recommendations were mostly based on quantitative information such as users' ratings, which made the accuracy be lowered. To solve these problems, many studies have been actively attempted to improve the performance of the recommendation system by using other information besides the quantitative information. Good examples are the usages of the sentiment analysis on customer review text data. Nevertheless, the existing research has not directly combined the results of the sentiment analysis and quantitative rating scores in the recommendation system. Therefore, this study aims to reflect the sentiments shown in the reviews into the rating scores. In other words, we propose a new algorithm that can directly convert the user 's own review into the empirically quantitative information and reflect it directly to the recommendation system. To do this, we needed to quantify users' reviews, which were originally qualitative information. In this study, sentiment score was calculated through sentiment analysis technique of text mining. The data was targeted for movie review. Based on the data, a domain specific sentiment dictionary is constructed for the movie reviews. Regression analysis was used as a method to construct sentiment dictionary. Each positive / negative dictionary was constructed using Lasso regression, Ridge regression, and ElasticNet methods. Based on this constructed sentiment dictionary, the accuracy was verified through confusion matrix. The accuracy of the Lasso based dictionary was 70%, the accuracy of the Ridge based dictionary was 79%, and that of the ElasticNet (${\alpha}=0.3$) was 83%. Therefore, in this study, the sentiment score of the review is calculated based on the dictionary of the ElasticNet method. It was combined with a rating to create a new rating. In this paper, we show that the collaborative filtering that reflects sentiment scores of user review is superior to the traditional method that only considers the existing rating. In order to show that the proposed algorithm is based on memory-based user collaboration filtering, item-based collaborative filtering and model based matrix factorization SVD, and SVD ++. Based on the above algorithm, the mean absolute error (MAE) and the root mean square error (RMSE) are calculated to evaluate the recommendation system with a score that combines sentiment scores with a system that only considers scores. When the evaluation index was MAE, it was improved by 0.059 for UBCF, 0.0862 for IBCF, 0.1012 for SVD and 0.188 for SVD ++. When the evaluation index is RMSE, UBCF is 0.0431, IBCF is 0.0882, SVD is 0.1103, and SVD ++ is 0.1756. As a result, it can be seen that the prediction performance of the evaluation point reflecting the sentiment score proposed in this paper is superior to that of the conventional evaluation method. In other words, in this paper, it is confirmed that the collaborative filtering that reflects the sentiment score of the user review shows superior accuracy as compared with the conventional type of collaborative filtering that only considers the quantitative score. We then attempted paired t-test validation to ensure that the proposed model was a better approach and concluded that the proposed model is better. In this study, to overcome limitations of previous researches that judge user's sentiment only by quantitative rating score, the review was numerically calculated and a user's opinion was more refined and considered into the recommendation system to improve the accuracy. The findings of this study have managerial implications to recommendation system developers who need to consider both quantitative information and qualitative information it is expect. The way of constructing the combined system in this paper might be directly used by the developers.

An Expert System for the Estimation of the Growth Curve Parameters of New Markets (신규시장 성장모형의 모수 추정을 위한 전문가 시스템)

  • Lee, Dongwon;Jung, Yeojin;Jung, Jaekwon;Park, Dohyung
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.17-35
    • /
    • 2015
  • Demand forecasting is the activity of estimating the quantity of a product or service that consumers will purchase for a certain period of time. Developing precise forecasting models are considered important since corporates can make strategic decisions on new markets based on future demand estimated by the models. Many studies have developed market growth curve models, such as Bass, Logistic, Gompertz models, which estimate future demand when a market is in its early stage. Among the models, Bass model, which explains the demand from two types of adopters, innovators and imitators, has been widely used in forecasting. Such models require sufficient demand observations to ensure qualified results. In the beginning of a new market, however, observations are not sufficient for the models to precisely estimate the market's future demand. For this reason, as an alternative, demands guessed from those of most adjacent markets are often used as references in such cases. Reference markets can be those whose products are developed with the same categorical technologies. A market's demand may be expected to have the similar pattern with that of a reference market in case the adoption pattern of a product in the market is determined mainly by the technology related to the product. However, such processes may not always ensure pleasing results because the similarity between markets depends on intuition and/or experience. There are two major drawbacks that human experts cannot effectively handle in this approach. One is the abundance of candidate reference markets to consider, and the other is the difficulty in calculating the similarity between markets. First, there can be too many markets to consider in selecting reference markets. Mostly, markets in the same category in an industrial hierarchy can be reference markets because they are usually based on the similar technologies. However, markets can be classified into different categories even if they are based on the same generic technologies. Therefore, markets in other categories also need to be considered as potential candidates. Next, even domain experts cannot consistently calculate the similarity between markets with their own qualitative standards. The inconsistency implies missing adjacent reference markets, which may lead to the imprecise estimation of future demand. Even though there are no missing reference markets, the new market's parameters can be hardly estimated from the reference markets without quantitative standards. For this reason, this study proposes a case-based expert system that helps experts overcome the drawbacks in discovering referential markets. First, this study proposes the use of Euclidean distance measure to calculate the similarity between markets. Based on their similarities, markets are grouped into clusters. Then, missing markets with the characteristics of the cluster are searched for. Potential candidate reference markets are extracted and recommended to users. After the iteration of these steps, definite reference markets are determined according to the user's selection among those candidates. Then, finally, the new market's parameters are estimated from the reference markets. For this procedure, two techniques are used in the model. One is clustering data mining technique, and the other content-based filtering of recommender systems. The proposed system implemented with those techniques can determine the most adjacent markets based on whether a user accepts candidate markets. Experiments were conducted to validate the usefulness of the system with five ICT experts involved. In the experiments, the experts were given the list of 16 ICT markets whose parameters to be estimated. For each of the markets, the experts estimated its parameters of growth curve models with intuition at first, and then with the system. The comparison of the experiments results show that the estimated parameters are closer when they use the system in comparison with the results when they guessed them without the system.

Contents Recommendation Search System using Personalized Profile on Semantic Web (시맨틱 웹에서 개인화 프로파일을 이용한 콘텐츠 추천 검색 시스템)

  • Song, Chang-Woo;Kim, Jong-Hun;Chung, Kyung-Yong;Ryu, Joong-Kyung;Lee, Jung-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.1
    • /
    • pp.318-327
    • /
    • 2008
  • With the advance of information technologies and the spread of Internet use, the volume of usable information is increasing explosively. A content recommendation system provides the services of filtering out information that users do not want and recommending useful information. Existing recommendation systems analyze the records and patterns of Web connection and information demanded by users through data mining techniques and provide contents from the service provider's viewpoint. Because it is hard to express information on the users' side such as users' preference and lifestyle, only limited services can be provided. The semantic Web technology can define meaningful relations among data so that information can be collected, processed and applied according to purpose for all objects including images and documents. The present study proposes a content recommendation search system that can update and reflect personalized profiles dynamically in semantic Web environment. A personalized profile is composed of Collector that contains the characteristics of the profile, Aggregator that collects profile data from various collectors, and Resolver that interprets profile collectors specific to profile characteristic. The personalized module helps the content recommendation server make regular synchronization with the personalized profile. Choosing music as a recommended content, we conduct an experience on whether the personalized profile delivers the content to the content recommendation server according to a service scenario and the server provides a recommendation list reflecting the user's preference and lifestyle.

A license plate detection method based on contour extraction that adapts to environmental changes (주변 환경 변화에 적응하는 윤곽선 추출 기반의 자동차 번호판 검출 기법)

  • Pyo, Sung-Kook;Lee, Gang-seong;Park, Young-Soo;Lee, Sang-Hun
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.9
    • /
    • pp.31-39
    • /
    • 2018
  • In this paper, we proposed a license plate detection method based on contour extraction that adapts to environmental changes. The proposed method extracts contour lines using DoG (Difference of Gaussian) to remove unnecessary noise parts in the contour extraction process. Binarization was applied in ugly outline images, and erosion and dilation operations were used to emphasize the contour of the character part. Then, only the outline of the ratio of the characters of the plate was extracted through the ratio of the width and height of the characters. And the case where the outline is the longest is estimated by estimating the characters of the license plate. For the experiment, we applied 130 image data to license plate on the front of the vehicle, oblique environment, and environment images with various backgrounds. I also experimented with motorcycle images of different license plate patterns. Experimental results showed that the detection rate of the oblique image was 93% and that of the various background environment was 70% in the motorcycle image but 98% in the front image.

Text Filtering using Iterative Boosting Algorithms (반복적 부스팅 학습을 이용한 문서 여과)

  • Hahn, Sang-Youn;Zang, Byoung-Tak
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.4
    • /
    • pp.270-277
    • /
    • 2002
  • Text filtering is a task of deciding whether a document has relevance to a specified topic. As Internet and Web becomes wide-spread and the number of documents delivered by e-mail explosively grows the importance of text filtering increases as well. The aim of this paper is to improve the accuracy of text filtering systems by using machine learning techniques. We apply AdaBoost algorithms to the filtering task. An AdaBoost algorithm generates and combines a series of simple hypotheses. Each of the hypotheses decides the relevance of a document to a topic on the basis of whether or not the document includes a certain word. We begin with an existing AdaBoost algorithm which uses weak hypotheses with their output of 1 or -1. Then we extend the algorithm to use weak hypotheses with real-valued outputs which was proposed recently to improve error reduction rates and final filtering performance. Next, we attempt to achieve further improvement in the AdaBoost's performance by first setting weights randomly according to the continuous Poisson distribution, executing AdaBoost, repeating these steps several times, and then combining all the hypotheses learned. This has the effect of mitigating the ovefitting problem which may occur when learning from a small number of data. Experiments have been performed on the real document collections used in TREC-8, a well-established text retrieval contest. This dataset includes Financial Times articles from 1992 to 1994. The experimental results show that AdaBoost with real-valued hypotheses outperforms AdaBoost with binary-valued hypotheses, and that AdaBoost iterated with random weights further improves filtering accuracy. Comparison results of all the participants of the TREC-8 filtering task are also provided.

A Study of a Module of Wrist Direction Recognition using EMG Signals (근전도를 이용한 손목방향인식 모듈에 관한 연구)

  • Lee, C.H.;Kang, S.I.;Bae, S.H.;Kwon, J.W.;LEE, D.H.
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.7 no.1
    • /
    • pp.51-58
    • /
    • 2013
  • As it is changing into aging society, rehabilitation, welfare and sports industry markets are being expanded fast. Especially, the field of vital signals interface to control welfare instruments like wheelchair, rehabilitation ones like an artificial arm and leg and general electronic ones is a new technology field in the future. Also, this technology can help not only the handicapped, the old and the weak and the rehabilitation patients but also the general public in various application field. The commercial bio-signal measurement instruments and interface systems are complicated, expensive and large-scaled. So, there are a lot of limitations for using in real life with ease. this thesis proposes a wireless transmission interface system that uses EMG(electromyogram) signals and a control module to manipulate hardware systems with portable size. We have designed a hardware module that receives the EMG signals occurring at the time of wrist movement and eliminated noises with filter and amplified the signals effectively. DSP(Digital Signal Processor) chip of TMS320F2808 which was supplied from TI company was used for converting into digital signals from measured EMG signals and digital filtering. We also have used PCA(Principal Component Analysis) technique and classified into four motions which have right, left, up and down direction. This data was transmitted by wireless module in order to display at PC monitor. As a result, the developed system obtains recognition success ratio above 85% for four different motions. If the recognition ratio will be increased with more experiments. this implemented system using EMG wrist direction signals could be used to control various hardware systems.

  • PDF

Relative RPCs Bias-compensation for Satellite Stereo Images Processing (고해상도 입체 위성영상 처리를 위한 무기준점 기반 상호표정)

  • Oh, Jae Hong;Lee, Chang No
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.36 no.4
    • /
    • pp.287-293
    • /
    • 2018
  • It is prerequisite to generate epipolar resampled images by reducing the y-parallax for accurate and efficient processing of satellite stereo images. Minimizing y-parallax requires the accurate sensor modeling that is carried out with ground control points. However, the approach is not feasible over inaccessible areas where control points cannot be easily acquired. For the case, a relative orientation can be utilized only with conjugate points, but its accuracy for satellite sensor should be studied because the sensor has different geometry compared to well-known frame type cameras. Therefore, we carried out the bias-compensation of RPCs (Rational Polynomial Coefficients) without any ground control points to study its precision and effects on the y-parallax in epipolar resampled images. The conjugate points were generated with stereo image matching with outlier removals. RPCs compensation was performed based on the affine and polynomial models. We analyzed the reprojection error of the compensated RPCs and the y-parallax in the resampled images. Experimental result showed one-pixel level of y-parallax for Kompsat-3 stereo data.

Study on the Emergency Assessment about Seismic Safety of Cable-supported Bridges using the Comparison of Displacement due to Earthquake with Disaster Management Criteria (변위 비교를 통한 케이블지지교량의 긴급 지진 안전성 평가 방법의 고찰)

  • Park, Sung-Woo;Lee, Seung Han
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.22 no.6
    • /
    • pp.114-122
    • /
    • 2018
  • This study presents the emergency assessment method about seismic safety of cable-supported bridges using seismic acceleration sensors installed on the primary structural elements of them. The structural models of bridges are updated iteratively to make their dynamic characteristics to be similar to those of real bridges based on the comparison of their natural frequencies with those of real bridges estimated from acceleration data measured at ordinary times by the seismic acceleration sensor. The displacement at the location of each seismic acceleration sensor is derived by seismic analysis using design earthquake, and the peak value of them is determined as the disaster management criteria in advance. The displacement time history is calculated by the double integration of the acceleration time history which is recorded at each seismic acceleration sensor and filtered by high cut(low pass) and low cut(high pass) filters. Finally, the seismic safety is evaluated by the comparison of the peak value in calculated displacement time history with the disaster management criteria determined in advance. The applicability of proposed methodology is verified by performing the seismic safety assessment of 12 cable-supported bridges using the acceleration data recorded during Gyeongju earthquake.

Speech extraction based on AuxIVA with weighted source variance and noise dependence for robust speech recognition (강인 음성 인식을 위한 가중화된 음원 분산 및 잡음 의존성을 활용한 보조함수 독립 벡터 분석 기반 음성 추출)

  • Shin, Ui-Hyeop;Park, Hyung-Min
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.326-334
    • /
    • 2022
  • In this paper, we propose speech enhancement algorithm as a pre-processing for robust speech recognition in noisy environments. Auxiliary-function-based Independent Vector Analysis (AuxIVA) is performed with weighted covariance matrix using time-varying variances with scaling factor from target masks representing time-frequency contributions of target speech. The mask estimates can be obtained using Neural Network (NN) pre-trained for speech extraction or diffuseness using Coherence-to-Diffuse power Ratio (CDR) to find the direct sounds component of a target speech. In addition, outputs for omni-directional noise are closely chained by sharing the time-varying variances similarly to independent subspace analysis or IVA. The speech extraction method based on AuxIVA is also performed in Independent Low-Rank Matrix Analysis (ILRMA) framework by extending the Non-negative Matrix Factorization (NMF) for noise outputs to Non-negative Tensor Factorization (NTF) to maintain the inter-channel dependency in noise output channels. Experimental results on the CHiME-4 datasets demonstrate the effectiveness of the presented algorithms.

Personalized Exhibition Booth Recommendation Methodology Using Sequential Association Rule (순차 연관 규칙을 이용한 개인화된 전시 부스 추천 방법)

  • Moon, Hyun-Sil;Jung, Min-Kyu;Kim, Jae-Kyeong;Kim, Hyea-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.195-211
    • /
    • 2010
  • An exhibition is defined as market events for specific duration to present exhibitors' main product range to either business or private visitors, and it also plays a key role as effective marketing channels. Especially, as the effect of the opinions of the visitors after the exhibition impacts directly on sales or the image of companies, exhibition organizers must consider various needs of visitors. To meet needs of visitors, ubiquitous technologies have been applied in some exhibitions. However, despite of the development of the ubiquitous technologies, their services cannot always reflect visitors' preferences as they only generate information when visitors request. As a result, they have reached their limit to meet needs of visitors, which consequently might lead them to loss of marketing opportunity. Recommendation systems can be the right type to overcome these limitations. They can recommend the booths to coincide with visitors' preferences, so that they help visitors who are in difficulty for choices in exhibition environment. One of the most successful and widely used technologies for building recommender systems is called Collaborative Filtering. Traditional recommender systems, however, only use neighbors' evaluations or behaviors for a personalized prediction. Therefore, they can not reflect visitors' dynamic preference, and also lack of accuracy in exhibition environment. Although there is much useful information to infer visitors' preference in ubiquitous environment (e.g., visitors' current location, booth visit path, and so on), they use only limited information for recommendation. In this study, we propose a booth recommendation methodology using Sequential Association Rule which considers the sequence of visiting. Recent studies of Sequential Association Rule use the constraints to improve the performance. However, since traditional Sequential Association Rule considers the whole rules to recommendation, they have a scalability problem when they are adapted to a large exhibition scale. To solve this problem, our methodology composes the confidence database before recommendation process. To compose the confidence database, we first search preceding rules which have the frequency above threshold. Next, we compute the confidences of each preceding rules to each booth which is not contained in preceding rules. Therefore, the confidence database has two kinds of information which are preceding rules and their confidence to each booth. In recommendation process, we just generate preceding rules of the target visitors based on the records of the visits, and recommend booths according to the confidence database. Throughout these steps, we expect reduction of time spent on recommendation process. To evaluate proposed methodology, we use real booth visit records which are collected by RFID technology in IT exhibition. Booth visit records also contain the visit sequence of each visitor. We compare the performance of proposed methodology with traditional Collaborative Filtering system. As a result, our proposed methodology generally shows higher performance than traditional Collaborative Filtering. We can also see some features of it in experimental results. First, it shows the highest performance at one booth recommendation. It detects preceding rules with some portions of visitors. Therefore, if there is a visitor who moved with very a different pattern compared to the whole visitors, it cannot give a correct recommendation for him/her even though we increase the number of recommendation. Trained by the whole visitors, it cannot correctly give recommendation to visitors who have a unique path. Second, the performance of general recommendation systems increase as time expands. However, our methodology shows higher performance with limited information like one or two time periods. Therefore, not only can it recommend even if there is not much information of the target visitors' booth visit records, but also it uses only small amount of information in recommendation process. We expect that it can give real?time recommendations in exhibition environment. Overall, our methodology shows higher performance ability than traditional Collaborative Filtering systems, we expect it could be applied in booth recommendation system to satisfy visitors in exhibition environment.