• Title/Summary/Keyword: Tag Collection

Search Result 75, Processing Time 0.025 seconds

Curation Service Implementation using Machine Learning Algorithm (기계학습 알고리즘을 이용한 Curation 서비스 구현)

  • Lee, Hyung Ho;Lee, Hak Jae;Kim, Tae Su;Kim, Mi Hyun
    • Smart Media Journal
    • /
    • v.9 no.4
    • /
    • pp.118-125
    • /
    • 2020
  • This paper is conducted for automatically recommending and providing information services desired by users on websites of local governments and public institutions with vast amounts of information, In this system, we defined a method of collecting data based on the SiiRU CMS system that collects and preprocesses data, and a study that provides curation services (contents and menus) to users through a collaborative filtering algorithm based on machine learning. Also, the data used in the paper is conducted based on about 1 million data collected in 2019. The analyzed data can provide important information that cannot be easily accessed by providing a cloud tag service or recommended menu for users to conveniently view, and the environment configuration that can realize this service to local governments and public institutions is also provided.

Development of EST-SSRs and Assessment of Genetic Diversity in Germplasm of the Finger Millet, Eleusine coracana (L.) Gaertn.

  • Wang, Xiaohan;Lee, Myung Chul;Choi, Yu-Mi;Kim, Seong-Hoon;Han, Seahee;Desta, Kebede Taye;Yoon, Hye-myeong;Lee, Yoonjung;Oh, Miae;Yi, Jung Yoon;Shin, Myoung-Jae;Kim, Kyung-Min
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.66 no.4
    • /
    • pp.443-451
    • /
    • 2021
  • Finger millet (Eleusine coracana) is widely cultivated in tropical regions worldwide owing to its high nutritional value. Finger millet is more tolerant against biotic and abiotic stresses such as pests, drought, and salt than other millet crops; therefore, it was proposed as a candidate crop to adapt to climate change in Korea. In 2019, we used expressed sequence tag simple sequence repeat (EST-SSR) markers to evaluate the genetic diversity and structure of 102 finger millet accessions from two geographical regions (Africa and South Asia) to identify appropriate accessions and enhance crop diversity in Korea. In total, 40 primers produced 116 alleles, ranging in size from 135 to 457 bp, with a mean polymorphism information content (PIC) of 0.18225. Polymorphism was detected among the 40 primers, and 13 primers were found to have PIC values > 0.3. Principal coordinate and phylogenetic analyses, based on the combined data of both markers, grouped the finger millet accessions according to their respective collection areas.Therefore, the 102 accessions were classified into two groups, one from Asia and the other from Africa. We have conducted an in-depth study on the finger millet landrace pedigree. By sorting out and using the molecular characteristics of each pedigree, it will be useful for the management and accession identification of the plant resource. The novel SSR markers developed in this study will aid in future genetic analyses of E. coracana.

Link Travel Time Estimation and Evaluation of Applicability to Traffic Information Collection Based RFID Probe Data (RFID 기반의 통행시간 추정 기법 개발 및 교통정보수집 적용가능성 평가)

  • Shim, Sang-Woo;Choi, Kee-Choo;Lee, Kyun-Jin
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.6 no.2
    • /
    • pp.15-25
    • /
    • 2007
  • This paper aims at testing the applicability of RFID (radio frequency identification) based link travel time estimation algorithm in urban street settings in Jeju island Korea. For this, we developed algorithm and compared link travel times derived from the RFID probe based algorithm with those from (already available) GPS based link travel time estimation algorithm and with the actual link travel times from survey. RFID readers are composed of master reader and slave reader and the participating passenger cars were supposed to be equipped with RFID tag inside the vehicle. The data were sent to traffic information center and we used those data in comparison. The algorithm produced link travel times in a successful manner and the accuracy of those link travel times was about 88%. For the same link segments, the accuracy of GPS based link travel times was 93%. The t-test showed that both RFID and GPS based link travel times were not different in accuracy from statistical point of view. The applicability of RFID was tested successfully and the algorithm proposed seemed to be used in similar urban settings. Some limits and future research agenda have also been presented.

  • PDF

Detecting Errors in POS-Tagged Corpus on XGBoost and Cross Validation (XGBoost와 교차검증을 이용한 품사부착말뭉치에서의 오류 탐지)

  • Choi, Min-Seok;Kim, Chang-Hyun;Park, Ho-Min;Cheon, Min-Ah;Yoon, Ho;Namgoong, Young;Kim, Jae-Kyun;Kim, Jae-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.7
    • /
    • pp.221-228
    • /
    • 2020
  • Part-of-Speech (POS) tagged corpus is a collection of electronic text in which each word is annotated with a tag as the corresponding POS and is widely used for various training data for natural language processing. The training data generally assumes that there are no errors, but in reality they include various types of errors, which cause performance degradation of systems trained using the data. To alleviate this problem, we propose a novel method for detecting errors in the existing POS tagged corpus using the classifier of XGBoost and cross-validation as evaluation techniques. We first train a classifier of a POS tagger using the POS-tagged corpus with some errors and then detect errors from the POS-tagged corpus using cross-validation, but the classifier cannot detect errors because there is no training data for detecting POS tagged errors. We thus detect errors by comparing the outputs (probabilities of POS) of the classifier, adjusting hyperparameters. The hyperparameters is estimated by a small scale error-tagged corpus, in which text is sampled from a POS-tagged corpus and which is marked up POS errors by experts. In this paper, we use recall and precision as evaluation metrics which are widely used in information retrieval. We have shown that the proposed method is valid by comparing two distributions of the sample (the error-tagged corpus) and the population (the POS-tagged corpus) because all detected errors cannot be checked. In the near future, we will apply the proposed method to a dependency tree-tagged corpus and a semantic role tagged corpus.

A Folksonomy Ranking Framework: A Semantic Graph-based Approach (폭소노미 사이트를 위한 랭킹 프레임워크 설계: 시맨틱 그래프기반 접근)

  • Park, Hyun-Jung;Rho, Sang-Kyu
    • Asia pacific journal of information systems
    • /
    • v.21 no.2
    • /
    • pp.89-116
    • /
    • 2011
  • In collaborative tagging systems such as Delicious.com and Flickr.com, users assign keywords or tags to their uploaded resources, such as bookmarks and pictures, for their future use or sharing purposes. The collection of resources and tags generated by a user is called a personomy, and the collection of all personomies constitutes the folksonomy. The most significant need of the folksonomy users Is to efficiently find useful resources or experts on specific topics. An excellent ranking algorithm would assign higher ranking to more useful resources or experts. What resources are considered useful In a folksonomic system? Does a standard superior to frequency or freshness exist? The resource recommended by more users with mere expertise should be worthy of attention. This ranking paradigm can be implemented through a graph-based ranking algorithm. Two well-known representatives of such a paradigm are Page Rank by Google and HITS(Hypertext Induced Topic Selection) by Kleinberg. Both Page Rank and HITS assign a higher evaluation score to pages linked to more higher-scored pages. HITS differs from PageRank in that it utilizes two kinds of scores: authority and hub scores. The ranking objects of these pages are limited to Web pages, whereas the ranking objects of a folksonomic system are somewhat heterogeneous(i.e., users, resources, and tags). Therefore, uniform application of the voting notion of PageRank and HITS based on the links to a folksonomy would be unreasonable, In a folksonomic system, each link corresponding to a property can have an opposite direction, depending on whether the property is an active or a passive voice. The current research stems from the Idea that a graph-based ranking algorithm could be applied to the folksonomic system using the concept of mutual Interactions between entitles, rather than the voting notion of PageRank or HITS. The concept of mutual interactions, proposed for ranking the Semantic Web resources, enables the calculation of importance scores of various resources unaffected by link directions. The weights of a property representing the mutual interaction between classes are assigned depending on the relative significance of the property to the resource importance of each class. This class-oriented approach is based on the fact that, in the Semantic Web, there are many heterogeneous classes; thus, applying a different appraisal standard for each class is more reasonable. This is similar to the evaluation method of humans, where different items are assigned specific weights, which are then summed up to determine the weighted average. We can check for missing properties more easily with this approach than with other predicate-oriented approaches. A user of a tagging system usually assigns more than one tags to the same resource, and there can be more than one tags with the same subjectivity and objectivity. In the case that many users assign similar tags to the same resource, grading the users differently depending on the assignment order becomes necessary. This idea comes from the studies in psychology wherein expertise involves the ability to select the most relevant information for achieving a goal. An expert should be someone who not only has a large collection of documents annotated with a particular tag, but also tends to add documents of high quality to his/her collections. Such documents are identified by the number, as well as the expertise, of users who have the same documents in their collections. In other words, there is a relationship of mutual reinforcement between the expertise of a user and the quality of a document. In addition, there is a need to rank entities related more closely to a certain entity. Considering the property of social media that ensures the popularity of a topic is temporary, recent data should have more weight than old data. We propose a comprehensive folksonomy ranking framework in which all these considerations are dealt with and that can be easily customized to each folksonomy site for ranking purposes. To examine the validity of our ranking algorithm and show the mechanism of adjusting property, time, and expertise weights, we first use a dataset designed for analyzing the effect of each ranking factor independently. We then show the ranking results of a real folksonomy site, with the ranking factors combined. Because the ground truth of a given dataset is not known when it comes to ranking, we inject simulated data whose ranking results can be predicted into the real dataset and compare the ranking results of our algorithm with that of a previous HITS-based algorithm. Our semantic ranking algorithm based on the concept of mutual interaction seems to be preferable to the HITS-based algorithm as a flexible folksonomy ranking framework. Some concrete points of difference are as follows. First, with the time concept applied to the property weights, our algorithm shows superior performance in lowering the scores of older data and raising the scores of newer data. Second, applying the time concept to the expertise weights, as well as to the property weights, our algorithm controls the conflicting influence of expertise weights and enhances overall consistency of time-valued ranking. The expertise weights of the previous study can act as an obstacle to the time-valued ranking because the number of followers increases as time goes on. Third, many new properties and classes can be included in our framework. The previous HITS-based algorithm, based on the voting notion, loses ground in the situation where the domain consists of more than two classes, or where other important properties, such as "sent through twitter" or "registered as a friend," are added to the domain. Forth, there is a big difference in the calculation time and memory use between the two kinds of algorithms. While the matrix multiplication of two matrices, has to be executed twice for the previous HITS-based algorithm, this is unnecessary with our algorithm. In our ranking framework, various folksonomy ranking policies can be expressed with the ranking factors combined and our approach can work, even if the folksonomy site is not implemented with Semantic Web languages. Above all, the time weight proposed in this paper will be applicable to various domains, including social media, where time value is considered important.