• Title/Summary/Keyword: 로치오

Search Result 7, Processing Time 0.02 seconds

A Study on the Automatic Descriptor Assignment for Scientific Journal Articles Using Rocchio Algorithm (로치오 알고리즘을 이용한 학술지 논문의 디스크 립터 자동부여에 관한 연구)

  • Kim, Pan-Jun
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.3 s.61
    • /
    • pp.69-89
    • /
    • 2006
  • Several performance factors which have applied to the automatic indexing with controlled vocabulary and text categorization based on Rocchio algorithm were examined, and the simple method for performance improvement of them were tried. Also, results of the methods using Rocchio algorithm were compared with those of other learning based methods on the same conditions. As a result, keeping with the strong points which are implementational easiness and computational efficiency, the methods based Rocchio algorithms showed equivalent or better results than other learning based methods(SVM, VPT, NB). Especially, for the semi-automatic indexing(computer-aided indexing), the methods using Rocchio algorithm with a high recall level could be used preferentially.

A Study on the Performance Improvement of Rocchio Classifier with Term Weighting Methods (용어 가중치부여 기법을 이용한 로치오 분류기의 성능 향상에 관한 연구)

  • Kim, Pan-Jun
    • Journal of the Korean Society for information Management
    • /
    • v.25 no.1
    • /
    • pp.211-233
    • /
    • 2008
  • This study examines various weighting methods for improving the performance of automatic classification based on Rocchio algorithm on two collections(LISA, Reuters-21578). First, three factors for weighting are identified as document factor, document factor, category factor for each weighting schemes, the performance of each was investigated. Second, the performance of combined weighting methods between the single schemes were examined. As a result, for the single schemes based on each factor, category-factor-based schemes showed the best performance, document set-factor-based schemes the second, and document-factor-based schemes the worst. For the combined weighting schemes, the schemes(idf*cat) which combine document set factor with category factor show better performance than the combined schemes(tf*cat or ltf*cat) which combine document factor with category factor as well as the common schemes (tfidf or ltfidf) that combining document factor with document set factor. However, according to the results of comparing the single weighting schemes with combined weighting schemes in the view of the collections, while category-factor-based schemes(cat only) perform best on LISA, the combined schemes(idf*cat) which combine document set factor with category factor showed best performance on the Reuters-21578. Therefore for the practical application of the weighting methods, it needs careful consideration of the categories in a collection for automatic classification.

Ranking by Inductive Inference in Collaborative Filtering Systems (협력적 여과 시스템에서 귀납 추리를 이용한 순위 결정)

  • Ko, Su-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.9
    • /
    • pp.659-668
    • /
    • 2010
  • Collaborative filtering systems grasp behaviors for a new user and need new information for the user in order to recommend interesting items to the user. For the purpose of acquiring the information the collaborative filtering systems learn behaviors for users based on the previous data and can obtain new information from the results. In this paper, we propose an inductive inference method to obtain new information for users and rank items by using the new information in the proposed method. The proposed method clusters users into groups by learning users through NMF among inductive machine learning methods and selects the group features from the groups by using chi-square. Then, the method classifies a new user into a group by using the bayesian probability model as one of inductive inference methods based on the rating values for the new user and the features of groups. Finally, the method decides the ranks of items by applying the Rocchio algorithm to items with the missing values.

An Analytical Study on Performance Factors of Automatic Classification based on Machine Learning (기계학습에 기초한 자동분류의 성능 요소에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.2
    • /
    • pp.33-59
    • /
    • 2016
  • This study examined the factors affecting the performance of automatic classification for the domestic conference papers based on machine learning techniques. In particular, In view of the classification performance that assigning automatically the class labels to the papers in Proceedings of the Conference of Korean Society for Information Management using Rocchio algorithm, I investigated the characteristics of the key factors (classifier formation methods, training set size, weighting schemes, label assigning methods) through the diversified experiments. Consequently, It is more effective that apply proper parameters (${\beta}$, ${\lambda}$) and training set size (more than 5 years) according to the classification environments and properties of the document set. and If the performance is equivalent, I discovered that the use of the more simple methods (single weighting schemes) is very efficient. Also, because the classification of domestic papers is corresponding with multi-label classification which assigning more than one label to an article, it is necessary to develop the optimum classification model based on the characteristics of the key factors in consideration of this environment.

Automatic Indexing with Controlled Vocabulary Using a Descriptor Profile (디스크립터 프로파일을 사용한 통제어휘 자동색인)

  • Kim Pan-Jun
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2006.08a
    • /
    • pp.153-160
    • /
    • 2006
  • 통제어휘를 사용하는 주제색인 작업에서 색인전문가를 효율적으로 지원할 수 있는 자동색인 방법으로 프로파일 방법의 성능과 특성을 검토해 보았다. 자동색인의 성능에 영향을 미치는 주요 요인들을 검토한 다음, 동일한 조건 하에서 프로파일 기반 방법과 다른 방법들(NB, SVM, VPT)의 성능을 비교하였다. 그 결과, 로치오 알고리즘에 기초한 프로파일을 사용하는 방법이 다른 방법들에 비해 저성능이라는 일부 평가를 일반화하기는 어렵다는 사실이 실험을 통해 드러났다. 또한, 후보 디스크립터 리스트의 생성을 통하여 색인전문가의 색인작업을 지원하는 반자동색인의 경우, F$_1$척도로는 SVM, VPT와 동등한 수준에 있으면서 재현율이 상대적으로 높은 수준인 프로파일 기반 방법을 우선적으로 고려해 볼 수 있을 것이다.

  • PDF

An Analytical Study on Automatic Classification of Domestic Journal articles Based on Machine Learning (기계학습에 기초한 국내 학술지 논문의 자동분류에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.2
    • /
    • pp.37-62
    • /
    • 2018
  • This study examined the factors affecting the performance of automatic classification based on machine learning for domestic journal articles in the field of LIS. In particular, In view of the classification performance that assigning automatically the class labels to the articles in "Journal of the Korean Society for Information Management", I investigated the characteristics of the key factors(weighting schemes, training set size, classification algorithms, label assigning methods) through the diversified experiments. Consequently, It is effective to apply each element appropriately according to the classification environment and the characteristics of the document set, and a fairly good performance can be obtained by using a simpler model. In addition, the classification of domestic journals can be considered as a multi-label classification that assigns more than one category to a specific article. Therefore, I proposed an optimal classification model using simple and fast classification algorithm and small learning set considering this environment.

Relationships Between Cadmium, Copper, Mercury, Zinc Levels and Metallothionein in the Liver and Kidney Cortex of Korean (한국인 간장 및 신장피질에 함유된 카드뮴, 구리, 수은, 아연 함량과 메탈로치오나인과의 관계)

  • Lee Sang Ki;Yoo Young Chan;Yun Yeo Pyo;Yang Ja YouL;Oh Seung Min;Chung Kyu Hyuck
    • Environmental Analysis Health and Toxicology
    • /
    • v.19 no.4
    • /
    • pp.383-388
    • /
    • 2004
  • In order to elucidate the relationships between cadmium, copper, mercury, zinc levels and metallothionein in the liver and kidney cortex of Korean, the levels of Cd, Zn, Hg, Cu and metallothionein (MT) were determined in the kidney cortex and liver of 50 subjects deceased in the period of January-November, 2001 in the area of Seoul and Gyeonggi Province of Korea. The mean age of the population studied was 36.3+/-12.3 years. The tissues were digested with microwave digestion system and the elements were determined by inductively coupled plasma atomic emission spectrometry. MT was determined by the Cd-hemoglobin affinity assay. The determined levels (mean+/-SD) were: 33.9+/-18.9 micrograms Cd/g wet weight; 47.5+/-12.6 micrograms Zn/g wet weight; 2.5+/-0.57 microgram Cu/g wet weight; 0.26+/-0.31 micrograms Hg/g wet weight, 4.0+/-3.1 mg MT/g wet weight in renal cortex and 2.5+/-1.9 micrograms Cd/g wet weight; 46.9+/-15.0 micrograms Zn/g wet weight; 6.2+/-2.5 micrograms Cu/g wet weight; 0.10+/-0.15 micrograms Hg/g wet weight, 0.92+/-0.57 mg MT/g wet weight in the liver. Positive relationships between Cd and MT, sum of four divalent metal and MT in the kidney cortex were observed. No other correlation was found between Cu and MT, Hg and MT, Zn and MT in either organs.