• Title/Summary/Keyword: Similarity Weight

Search Result 379, Processing Time 0.022 seconds

Determining Absolute Interpolation Weights for Neighborhood-Based Collaborative Filtering

  • Kim, Hyoung-Do
    • Management Science and Financial Engineering
    • /
    • v.16 no.2
    • /
    • pp.53-65
    • /
    • 2010
  • Despite the overall success of neighbor-based CF methods, there are some fundamental questions about neighbor selection and prediction mechanism including arbitrary similarity, over-fitting interpolation weights, no trust consideration between neighbours, etc. This paper proposes a simple method to compute absolute interpolation weights based on similarity values. In order to supplement the method, two schemes are additionally devised for high-quality neighbour selection and trust metrics based on co-ratings. The former requires that one or more neighbour's similarity should be better than a pre-specified level which is higher than the minimum level. The latter gives higher trust to neighbours that have more co-ratings. Experimental results show that the proposed method outperforms the pure IBCF by about 8% improvement. Furthermore, it can be easily combined with other predictors for achieving better prediction quality.

Development of Multi-Attribute Decision Making System for Conceptual Design of Light-Weight Rolling Stock (철도차량 경량화 개념설계를 위한 다속성 의사결정 시스템 설계)

  • Kim, Hee-Wook;Kim, Jong-Woon;Shin, Sung-Ryoung;Jeong, Hyeon-Seung
    • Proceedings of the KSR Conference
    • /
    • 2011.10a
    • /
    • pp.2973-2978
    • /
    • 2011
  • In this paper, a system is developed to support multi-attribute decision making for designing light-weight of rolling stock. Conceptual design of light-weight of rolling stock does not only mean reducing weight. It should be considered about some attributes like safety and environment, technology, etc. So technical attributes and needs of customers, manufacturers and management companies, passengers, should be reflected and qualitative evaluation methods are required. AHP(Analytical Hierarchy Process) and QFD(Quality Function Deployment) are used to decide weighted values of technical attributes and needs from customers. Finally, Alternatives for light-weight of rolling stock that are composed of alternatives of equipment are evaluated by TOPSIS(Technique for Order Preference by Similarity to Ideal Solution). A series of this process are made as a S/W. It could suggest a near-optimal alternative for light-weight of rolling stock.

  • PDF

Algorithms for Indexing and Integrating MPEG-7 Visual Descriptors (MPEG-7 시각 정보 기술자의 인덱싱 및 결합 알고리즘)

  • Song, Chi-Ill;Nang, Jong-Ho
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.1
    • /
    • pp.1-10
    • /
    • 2007
  • This paper proposes a new indexing mechanism for MPEG-7 visual descriptors, especially Dominant Color and Contour Shape descriptors, that guarantees an efficient similarity search for the multimedia database whose visual meta-data are represented with MPEG-7. Since the similarity metric used in the Dominant Color descriptor is based on Gaussian mixture model, the descriptor itself could be transform into a color histogram in which the distribution of the color values follows the Gauss distribution. Then, the transformed Dominant Color descriptor (i.e., the color histogram) is indexed in the proposed indexing mechanism. For the indexing of Contour Shape descriptor, we have used a two-pass algorithm. That is, in the first pass, since the similarity of two shapes could be roughly measured with the global parameters such as eccentricity and circularity used in Contour shape descriptor, the dissimilar image objects could be excluded with these global parameters first. Then, the similarities between the query and remaining image objects are measured with the peak parameters of Contour Shape descriptor. This two-pass approach helps to reduce the computational resources to measure the similarity of image objects using Contour Shape descriptor. This paper also proposes two integration schemes of visual descriptors for an efficient retrieval of multimedia database. The one is to use the weight of descriptor as a yardstick to determine the number of selected similar image objects with respect to that descriptor, and the other is to use the weight as the degree of importance of the descriptor in the global similarity measurement. Experimental results show that the proposed indexing and integration schemes produce a remarkable speed-up comparing to the exact similarity search, although there are some losses in the accuracy because of the approximated computation in indexing. The proposed schemes could be used to build a multimedia database represented in MPEG-7 that guarantees an efficient retrieval.

A Korean Text Summarization System Using Aggregate Similarity (도합유사도를 이용한 한국어 문서요약 시스템)

  • 김재훈;김준홍
    • Korean Journal of Cognitive Science
    • /
    • v.12 no.1_2
    • /
    • pp.35-42
    • /
    • 2001
  • In this paper. a document is represented as a weighted graph called a text relationship map. In the graph. a node represents a vector of nouns in a sentence, an edge completely connects other nodes. and a weight on the edge is a value of the similarity between two nodes. The similarity is based on the word overlap between the corresponding nodes. The importance of a node. called an aggregate similarity in this paper. is defined as the sum of weights on the links connecting it to other nodes on the map. In this paper. we present a Korean text summarization system using the aggregate similarity. To evaluate our system, we used two test collection, one collection (PAPER-InCon) consists of 100 papers in the field of computer science: the other collection (NEWS) is composed of 105 articles in the newspapers and had built by KOROlC. Under the compression rate of 20%. we achieved the recall of 46.6% (PAPER-InCon) and 30.5% (NEWS) and the precision of 76.9% (PAPER-InCon) and 42.3% (NEWS).

  • PDF

A Study on the CBR Pattern using Similarity and the Euclidean Calculation Pattern (유사도와 유클리디안 계산패턴을 이용한 CBR 패턴연구)

  • Yun, Jong-Chan;Kim, Hak-Chul;Kim, Jong-Jin;Youn, Sung-Dae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.4
    • /
    • pp.875-885
    • /
    • 2010
  • CBR (Case-Based Reasoning) is a technique to infer the relationships between existing data and case data, and the method to calculate similarity and Euclidean distance is mostly frequently being used. However, since those methods compare all the existing and case data, it also has a demerit that it takes much time for data search and filtering. Therefore, to solve this problem, various researches have been conducted. This paper suggests the method of SE(Speed Euclidean-distance) calculation that utilizes the patterns discovered in the existing process of computing similarity and Euclidean distance. Because SE calculation applies the patterns and weight found during inputting new cases and enables fast data extraction and short operation time, it can enhance computing speed for temporal or spatial restrictions and eliminate unnecessary computing operation. Through this experiment, it has been found that the proposed method improves performance in various computer environments or processing rate more efficiently than the existing method that extracts data using similarity or Euclidean method does.

Question Similarity Measurement of Chinese Crop Diseases and Insect Pests Based on Mixed Information Extraction

  • Zhou, Han;Guo, Xuchao;Liu, Chengqi;Tang, Zhan;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.3991-4010
    • /
    • 2021
  • The Question Similarity Measurement of Chinese Crop Diseases and Insect Pests (QSM-CCD&IP) aims to judge the user's tendency to ask questions regarding input problems. The measurement is the basis of the Agricultural Knowledge Question and Answering (Q & A) system, information retrieval, and other tasks. However, the corpus and measurement methods available in this field have some deficiencies. In addition, error propagation may occur when the word boundary features and local context information are ignored when the general method embeds sentences. Hence, these factors make the task challenging. To solve the above problems and tackle the Question Similarity Measurement task in this work, a corpus on Chinese crop diseases and insect pests(CCDIP), which contains 13 categories, was established. Then, taking the CCDIP as the research object, this study proposes a Chinese agricultural text similarity matching model, namely, the AgrCQS. This model is based on mixed information extraction. Specifically, the hybrid embedding layer can enrich character information and improve the recognition ability of the model on the word boundary. The multi-scale local information can be extracted by multi-core convolutional neural network based on multi-weight (MM-CNN). The self-attention mechanism can enhance the fusion ability of the model on global information. In this research, the performance of the AgrCQS on the CCDIP is verified, and three benchmark datasets, namely, AFQMC, LCQMC, and BQ, are used. The accuracy rates are 93.92%, 74.42%, 86.35%, and 83.05%, respectively, which are higher than that of baseline systems without using any external knowledge. Additionally, the proposed method module can be extracted separately and applied to other models, thus providing reference for related research.

Design and Implementation of a Similarity based Plant Disease Image Retrieval using Combined Descriptors and Inverse Proportion of Image Volumes (Descriptor 조합 및 동일 병명 이미지 수량 역비율 가중치를 적용한 유사도 기반 작물 질병 검색 기술 설계 및 구현)

  • Lim, Hye Jin;Jeong, Da Woon;Yoo, Seong Joon;Gu, Yeong Hyeon;Park, Jong Han
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.14 no.6
    • /
    • pp.30-43
    • /
    • 2018
  • Many studies have been carried out to retrieve images using colors, shapes, and textures which are characteristic of images. In addition, there is also progress in research related to the disease images of the crop. In this paper, to be a help to identify the disease occurred in crops grown in the agricultural field, we propose a similarity-based crop disease search system using the diseases image of horticulture crops. The proposed system improves the similarity retrieval performance compared to existing ones through the combination descriptor without using a single descriptor and applied the weight based calculation method to provide users with highly readable similarity search results. In this paper, a total of 13 Descriptors were used in combination. We used to retrieval of disease of six crops using a combination Descriptor, and a combination Descriptor with the highest average accuracy for each crop was selected as a combination Descriptor for the crop. The retrieved result were expressed as a percentage using the calculation method based on the ratio of disease names, and calculation method based on the weight. The calculation method based on the ratio of disease name has a problem in that number of images used in the query image and similarity search was output in a first order. To solve this problem, we used a calculation method based on weight. We applied the test image of each disease name to each of the two calculation methods to measure the classification performance of the retrieval results. We compared averages of retrieval performance for two calculation method for each crop. In cases of red pepper and apple, the performance of the calculation method based on the ratio of disease names was about 11.89% on average higher than that of the calculation method based on weight, respectively. In cases of chrysanthemum, strawberry, pear, and grape, the performance of the calculation method based on the weight was about 20.34% on average higher than that of the calculation method based on the ratio of disease names, respectively. In addition, the system proposed in this paper, UI/UX was configured conveniently via the feedback of actual users. Each system screen has a title and a description of the screen at the top, and was configured to display a user to conveniently view the information on the disease. The information of the disease searched based on the calculation method proposed above displays images and disease names of similar diseases. The system's environment is implemented for use with a web browser based on a pc environment and a web browser based on a mobile device environment.

Improvement of Pattern Recognition Capacity of the Fuzzy ART with the Variable Learning (가변 학습을 적용한 퍼지 ART 신경망의 패턴 인식 능력 향상)

  • Lee, Chang Joo;Son, Byounghee;Hong, Hee Sik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38B no.12
    • /
    • pp.954-961
    • /
    • 2013
  • In this paper, we propose a new learning method using a variable learning to improve pattern recognition in the FCSR(Fast Commit Slow Recode) learning method of the Fuzzy ART. Traditional learning methods have used a fixed learning rate in updating weight vector(representative pattern). In the traditional method, the weight vector will be updated with a fixed learning rate regardless of the degree of similarity of the input pattern and the representative pattern in the category. In this case, the updated weight vector is greatly influenced from the input pattern where it is on the boundary of the category. Thus, in noisy environments, this method has a problem in increasing unnecessary categories and reducing pattern recognition capacity. In the proposed method, the lower similarity between the representative pattern and input pattern is, the lower input pattern contributes for updating weight vector. As a result, this results in suppressing the unnecessary category proliferation and improving pattern recognition capacity of the Fuzzy ART in noisy environments.

Performance Improvement of Image Retrieval System by Presenting Query based on Human Perception (인간의 인지도에 근거한 질의를 통한 영상 검색의 성능 향상)

  • 유헌우;장동식;오근태
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.2
    • /
    • pp.158-165
    • /
    • 2003
  • Image similarity is often decided by computing the distance between two feature vectors. Unfortunately, the feature vector cannot always reflect the notion of similarity in human perception. Therefore, most current image retrieval systems use weights measuring the importance of each feature. In this paper new initial weight selection and update rules are proposed for image retrieval purpose. In order to obtain the purpose, database images are first divided into groups based on human perception and, inner and outer query are performed, and, then, optimal feature weights for each database images are computed through searching the group where the result images among retrieved images are belong. Experimental results on 2000 images show the performance of proposed algorithm.

An Empirical Study on Improvement model for Measuring of Project Similarity (과제 유사도 측정 개선모형에 관한 실증적 연구)

  • Jung, Ok-Nam;Rhew, Sung-Yul;Kim, Jong-Bae
    • Journal of Digital Contents Society
    • /
    • v.12 no.4
    • /
    • pp.457-465
    • /
    • 2011
  • The annual R&D investment in Korea increased by an average of 12.2percent during the last 5 years. Therefore, prevention of duplicate projects being performed became an important factor in promoting the efficiency of R&D investment and the originality of R&D projects. On measuring the similarity of projects, the measurement model used to estimate the accuracy of the similarity is crucial. In this paper, we propose an advanced measurement model on checking the similarity of R&D projects for promoting the efficiency of R&D investment. The proposed model is made up of the following steps for the model measurement, sampling and analyzing. During the sampling step, we append the abstract of R&D reports on the search engine based on document vector. We then measure the similarity on projects to use research title network which is consists of the compound keyword and the weight of items on during the analysis. The proposed method improved the accuracy for measuring the similarity of projects by an average of 0.19 over the existing search engine and by 9.25 over the simple keyword search on R&D projects. On searching the similarity with the appending conditions and high sampling, it improved the accuracy of measuring the similarity of R&D projects.