• Title/Summary/Keyword: data similarity

Search Result 2,059, Processing Time 0.023 seconds

A Study on Forecasting Accuracy Improvement of Case Based Reasoning Approach Using Fuzzy Relation (퍼지 관계를 활용한 사례기반추론 예측 정확성 향상에 관한 연구)

  • Lee, In-Ho;Shin, Kyung-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.67-84
    • /
    • 2010
  • In terms of business, forecasting is a work of what is expected to happen in the future to make managerial decisions and plans. Therefore, the accurate forecasting is very important for major managerial decision making and is the basis for making various strategies of business. But it is very difficult to make an unbiased and consistent estimate because of uncertainty and complexity in the future business environment. That is why we should use scientific forecasting model to support business decision making, and make an effort to minimize the model's forecasting error which is difference between observation and estimator. Nevertheless, minimizing the error is not an easy task. Case-based reasoning is a problem solving method that utilizes the past similar case to solve the current problem. To build the successful case-based reasoning models, retrieving the case not only the most similar case but also the most relevant case is very important. To retrieve the similar and relevant case from past cases, the measurement of similarities between cases is an important key factor. Especially, if the cases contain symbolic data, it is more difficult to measure the distances. The purpose of this study is to improve the forecasting accuracy of case-based reasoning approach using fuzzy relation and composition. Especially, two methods are adopted to measure the similarity between cases containing symbolic data. One is to deduct the similarity matrix following binary logic(the judgment of sameness between two symbolic data), the other is to deduct the similarity matrix following fuzzy relation and composition. This study is conducted in the following order; data gathering and preprocessing, model building and analysis, validation analysis, conclusion. First, in the progress of data gathering and preprocessing we collect data set including categorical dependent variables. Also, the data set gathered is cross-section data and independent variables of the data set include several qualitative variables expressed symbolic data. The research data consists of many financial ratios and the corresponding bond ratings of Korean companies. The ratings we employ in this study cover all bonds rated by one of the bond rating agencies in Korea. Our total sample includes 1,816 companies whose commercial papers have been rated in the period 1997~2000. Credit grades are defined as outputs and classified into 5 rating categories(A1, A2, A3, B, C) according to credit levels. Second, in the progress of model building and analysis we deduct the similarity matrix following binary logic and fuzzy composition to measure the similarity between cases containing symbolic data. In this process, the used types of fuzzy composition are max-min, max-product, max-average. And then, the analysis is carried out by case-based reasoning approach with the deducted similarity matrix. Third, in the progress of validation analysis we verify the validation of model through McNemar test based on hit ratio. Finally, we draw a conclusion from the study. As a result, the similarity measuring method using fuzzy relation and composition shows good forecasting performance compared to the similarity measuring method using binary logic for similarity measurement between two symbolic data. But the results of the analysis are not statistically significant in forecasting performance among the types of fuzzy composition. The contributions of this study are as follows. We propose another methodology that fuzzy relation and fuzzy composition could be applied for the similarity measurement between two symbolic data. That is the most important factor to build case-based reasoning model.

A Table Integration Technique Using Query Similarity Analysis

  • Choi, Go-Bong;Woo, Yong-Tae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.3
    • /
    • pp.105-112
    • /
    • 2019
  • In this paper, we propose a technique to analyze similarity between SQL queries and to assist integrating similar tables. First, the table information was extracted from the SQL queries through the query structure analyzer, and the similarity between the tables was measured using the Jacquard index technique. Then, similar table clusters are generated through hierarchical cluster analysis method and the co-occurence probability of the table used in the query is calculated. The possibility of integrating similar tables is classified by using the possibility of co-occurence of similarity table and table, and classifying them into an integrable cluster, a cluster requiring expert review, and a cluster with low integration possibility. This technique analyzes the SQL query in practice and analyse the possibility of table integration independent of the existing business, so that the existing schema can be effectively reconstructed without interruption of work or additional cost.

Collaborative Filtering Algorithm Based on User-Item Attribute Preference

  • Ji, JiaQi;Chung, Yeongjee
    • Journal of information and communication convergence engineering
    • /
    • v.17 no.2
    • /
    • pp.135-141
    • /
    • 2019
  • Collaborative filtering algorithms often encounter data sparsity issues. To overcome this issue, auxiliary information of relevant items is analyzed and an item attribute matrix is derived. In this study, we combine the user-item attribute preference with the traditional similarity calculation method to develop an improved similarity calculation approach and use weights to control the importance of these two elements. A collaborative filtering algorithm based on user-item attribute preference is proposed. The experimental results show that the performance of the recommender system is the most optimal when the weight of traditional similarity is equal to that of user-item attribute preference similarity. Although the rating-matrix is sparse, better recommendation results can be obtained by adding a suitable proportion of user-item attribute preference similarity. Moreover, the mean absolute error of the proposed approach is less than that of two traditional collaborative filtering algorithms.

Comparison Analysis of Co-authorship Network and Citation Based Network for Author Research Similarity Exploration

  • Jeeyoung, Yoon;Min, Song
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.56 no.4
    • /
    • pp.269-284
    • /
    • 2022
  • Exploring research similarity of researchers offers insight on research communities and potential interactions among scholars. While co-authorship is a popular measure for studying research similarity of researchers, it cannot provide insight on authors who have not collaborated yet. In this work, we present novel approach to capture research similarity of authors using citation information. Extensive study is conducted on DATA & KNOWLEDGE ENGINEERING (DKE) publications to demonstrate and compare suggested approach with co-authorship based approach. Analysis result shows that proposed approach distinguishes author relationships that is not shown in co-authorship network.

Acceleration sensor, and embedded system using location-aware

  • He, Wei;Nayel, Mohamed
    • Journal of Convergence Society for SMB
    • /
    • v.3 no.1
    • /
    • pp.23-30
    • /
    • 2013
  • In this paper, fuzzy entropy and similarity measure to measure the uncertainty and similarity of data as real value were introduced. Design of fuzzy entropy and similarity measure were illustrated and proved. Obtained measures were applied to the calculating process and discussed. Extension of data quantification results such as decision making and fuzzy game theory were also discussed.

  • PDF

The assessment of sound quality of loudspeaker system by using factor analysis and muliti-dimensional scaling (인자분석과 다효원척를 이용한 스피이커의 음질평가)

  • 황영수;김영일;차일환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.3 no.1
    • /
    • pp.16-24
    • /
    • 1984
  • The objective data and subjective data correlated in order to rate sound quality of loudspeaker system and these data were analyzed by the Factor Analysis and Multi-Dimensioinal Scaling. The dimensions yielded Factor Analysis were interpreted as "Contrast", "Metallic", "Rich", "Present" and their relation to physical variables were explored by studying the positions of loudspeaker systems in the respective dimension. When the subjective similarity degree of loudspeaker systems was compared with the objective similarity degree of loudspeaker systems by Multi-Dimensional Scaling, the similarity degree of sound pressure response in the listening room closely coincided with the subjective similarity degree regardless of sound source. This result implies the necessity of measurements taken not only in an anechoic room but also in a listening room in order to rate sound quality of loudspeaker systems.

  • PDF

Video Classification System Based on Similarity Representation Among Sequential Data (순차 데이터간의 유사도 표현에 의한 동영상 분류)

  • Lee, Hosuk;Yang, Jihoon
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.1
    • /
    • pp.1-8
    • /
    • 2018
  • It is not easy to learn simple expressions of moving picture data since it contains noise and a lot of information in addition to time-based information. In this study, we propose a similarity representation method and a deep learning method between sequential data which can express such video data abstractly and simpler. This is to learn and obtain a function that allow them to have maximum information when interpreting the degree of similarity between image data vectors constituting a moving picture. Through the actual data, it is confirmed that the proposed method shows better classification performance than the existing moving image classification methods.

Clustering Validity of Social Network Subgroup Using Attribute Similarity (속성유사도에 따른 사회연결망 서브그룹의 군집유효성)

  • Yoon, Han-Seong
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.17 no.1
    • /
    • pp.75-84
    • /
    • 2021
  • For analyzing big data, the social network is increasingly being utilized through relational data, which means the connection characteristics between entities such as people and objects. When the relational data does not exist directly, a social network can be configured by calculating relational data such as attribute similarity from attribute data of entities and using it as links. In this paper, the composition method of the social network using the attribute similarity between entities as a connection relationship, and the clustering method using subgroups for the configured social network are suggested, and the clustering effectiveness of the clustering results is evaluated. The analysis results can vary depending on the type and characteristics of the data to be analyzed, the type of attribute similarity selected, and the criterion value. In addition, the clustering effectiveness may not be consistent depending on the its evaluation method. Therefore, selections and experiments are necessary for better analysis results. Since the analysis results may be different depending on the type and characteristics of the analysis target, options for clustering, etc., there is a limitation. In addition, for performance evaluation of clustering, a study is needed to compare the method of this paper with the conventional method such as k-means.

On the Effect of Significance of Correlation Coefficient for Recommender System

  • Lee, Hee-Choon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.4
    • /
    • pp.1129-1139
    • /
    • 2006
  • Pearson's correlation coefficient and vector similarity are generally applied to The users' similarity weight of user based recommender system. This study is needed to find that the correlation coefficient of similarity weight is effected by the number of pair response and significance probability. From the classified correlation coefficient by the significance probability test on the correlation coefficient and pair of response, the change of MAE is studied by comparing the predicted precision of the two. The results are experimentally related with the change of MAE from the significant correlation coefficient and the number of pair response.

  • PDF

On the clustering of huge categorical data

  • Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.21 no.6
    • /
    • pp.1353-1359
    • /
    • 2010
  • Basic objective in cluster analysis is to discover natural groupings of items. In general, clustering is conducted based on some similarity (or dissimilarity) matrix or the original input data. Various measures of similarities between objects are developed. In this paper, we consider a clustering of huge categorical real data set which shows the aspects of time-location-activity of Korean people. Some useful similarity measure for the data set, are developed and adopted for the categorical variables. Hierarchical and nonhierarchical clustering method are applied for the considered data set which is huge and consists of many categorical variables.