• Title/Summary/Keyword: 위키피디아 카테고리

Search Result 8, Processing Time 0.021 seconds

ISA Relation Extraction from Wikipedia Category Structure (위키피디아 카테고리 구조를 이용한 상하위 관계 추출)

  • Choi, DongHyun;Choi, Key-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 2009.10a
    • /
    • pp.1-5
    • /
    • 2009
  • 상하위 관계 자동 추출은 분류체계를 자동 구축하는 데 있어서 핵심적인 내용이며, 이렇게 자동으로 구축된 분류 체계는 정보 추출과 같은 여러 가지 분야에 있어서 중요하게 사용된다. 본 논문에서는 위키피디아 카테고리 구조로부터 상하위 관계를 추출하는 방식에 대하여 제안한다. 본 논문에서는 판별하고자하는 위키피디아 카테고리 구조뿐만이 아닌, 그와 관련된 다른 위키피디아 카테고리 구조까지 고려하여 카테고리 이름에 나타난 토큰들간의 수식 그래프를 구축한 후, 그래프 분석 알고리즘을 통하여 각 카테고리 구조가 상하위 관계일 가능성에 대한 점수를 매긴다. 실험 결과, 본 알고리즘은 기존의 연구로 상하위 관계임을 판별할 수 없었던 일부 카테고리 구조에 대하여 성공적으로 상하위 관계인지를 판별하였다.

  • PDF

Thesaurus Updating Using Collective Intelligence: Based on Wikipedia Encyclopedia (집단지성을 활용한 시소러스 갱신에 관한 연구: 위키피디아를 중심으로)

  • Han, Seung-Hee
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.3
    • /
    • pp.25-43
    • /
    • 2009
  • The purpose of this study is to suggest how the classic thesaurus structure of terms and links can be mined and updated from Wikipedia encyclopedia, which is the best practice of collective intelligence. In a comparison with ASIS&T thesaurus, it was found that Wikipedia contains a substantial coverage of domain-specific concepts and semantic relations. Furthermore, it was resulted that the structural characteristics of Wikipedia, such as redirects, categories, and mutual links are suitable to extract semantic relationships of thesaurus. It is needed to apply to update various thesauri, including multilingual thesaurus, in order to generalize the results of this research.

An Ontology-based Analysis of Wikipedia Usage Data for Measuring degree-of-interest in Country (국가별 관심도 측정을 위한 온톨로지 기반 위키피디아 사용 데이터 분석)

  • Kim, Hyon Hee;Jo, Jinnam;Kim, Donggeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.4
    • /
    • pp.43-53
    • /
    • 2014
  • In this paper, we propose an ontology-based approach to measuring degree-of-interest in country by analyzing wikipedia usage data. First, we developed the degree-of-interest ontology called DOI ontology by extracting concept hierarchies from wikipedia categories. Second, we map the title of frequently edited articles into DOI ontology, and we measure degree-of-interest based on DOI ontology by analyzing wikipedia page views. Finally, we perform chi-square test of independence to figure out if interesting fields are independent or not by country. This approach shows interesting fields are closely related to each country, and provides degree of interests by country timely and flexibly as compared with conventional questionnaire survey analysis.

An Experimental Study on Feature Selection Using Wikipedia for Text Categorization (위키피디아를 이용한 분류자질 선정에 관한 연구)

  • Kim, Yong-Hwan;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.155-171
    • /
    • 2012
  • In text categorization, core terms of an input document are hardly selected as classification features if they do not occur in a training document set. Besides, synonymous terms with the same concept are usually treated as different features. This study aims to improve text categorization performance by integrating synonyms into a single feature and by replacing input terms not in the training document set with the most similar term occurring in training documents using Wikipedia. For the selection of classification features, experiments were performed in various settings composed of three different conditions: the use of category information of non-training terms, the part of Wikipedia used for measuring term-term similarity, and the type of similarity measures. The categorization performance of a kNN classifier was improved by 0.35~1.85% in $F_1$ value in all the experimental settings when non-learning terms were replaced by the learning term with the highest similarity above the threshold value. Although the improvement ratio is not as high as expected, several semantic as well as structural devices of Wikipedia could be used for selecting more effective classification features.

Improving the Biography Archive Service of Wikipedia (위키피디아 인물 아카이브 서비스 개선을 위한 분석 연구)

  • Choi, Sanghee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.52 no.1
    • /
    • pp.447-467
    • /
    • 2018
  • Biographical information about people is usually collected and provided by a company or an institute which has a specific standard to select people for service. Recently, user oriented contents service like Wikipedia has started biographical information service, Wikipedia Biography Portal, in which users select people and freely describe about them. This study collected 500 biographical data from three categories of Wikipedia biography portal such as criminals, faculty, and directors. The contents of data from each category were analyzed with the word frequency and the divergence indicator to identify the characteristics of each category. As a result, divergency indicator is effective to represent the differential factors of each category. This study provides word clouds of top 100 word with divergence indicator and top 100 common words of three categories with word frequency as a guide for users to write about a person in these categories and for editors to accept and monitor the biography from users.

A study on the nation images of the big three exporting countries in East Asia shown in Wikipedia English-Edition (영어 위키피디아 페이지뷰를 통한 한중일 국가 인지도 비교)

  • Lee, Youngwhan;Chun, Heuiju;Sawng, Youngwha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.5
    • /
    • pp.1071-1085
    • /
    • 2015
  • The researchers attempted to develop a way to extract a near real-time online nation image using social media. Referring to previous studies about nation images and the categories defined in Wikipedia, an ontology considering the characteristics of nation image was constructed. Separately, data sets from various social media were compared and the click view of Wikipedia English-edition was selected. The ontology was applied to the recent six years of the data extracted of the three big exporting countries of the east Asia, China, Japan, and Korea. To compare the nation images, correspondence analysis was employed to show images in the area of politics, society, culture, and economy. The nation images extracted are indeed the reasonable representation of them. The researchers verified them to a few known government policies and confirmed that it could be used to help government officers to make foreign policies to boost nation's export and to employ as a key performance index for them.

Improving Classification Accuracy in Hierarchical Trees via Greedy Node Expansion

  • Byungjin Lim;Jong Wook Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.6
    • /
    • pp.113-120
    • /
    • 2024
  • With the advancement of information and communication technology, we can easily generate various forms of data in our daily lives. To efficiently manage such a large amount of data, systematic classification into categories is essential. For effective search and navigation, data is organized into a tree-like hierarchical structure known as a category tree, which is commonly seen in news websites and Wikipedia. As a result, various techniques have been proposed to classify large volumes of documents into the terminal nodes of category trees. However, document classification methods using category trees face a problem: as the height of the tree increases, the number of terminal nodes multiplies exponentially, which increases the probability of misclassification and ultimately leads to a reduction in classification accuracy. Therefore, in this paper, we propose a new node expansion-based classification algorithm that satisfies the classification accuracy required by the application, while enabling detailed categorization. The proposed method uses a greedy approach to prioritize the expansion of nodes with high classification accuracy, thereby maximizing the overall classification accuracy of the category tree. Experimental results on real data show that the proposed technique provides improved performance over naive methods.

Participation Level in Online Knowledge Sharing: Behavioral Approach on Wikipedia (온라인 지식공유의 참여정도: 위키피디아에 대한 행태적 접근)

  • Park, Hyun Jung;Lee, Hong Joo;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.97-121
    • /
    • 2013
  • With the growing importance of knowledge for sustainable competitive advantages and innovation in a volatile environment, many researches on knowledge sharing have been conducted. However, previous researches have mostly relied on the questionnaire survey which has inherent perceptive errors of respondents. The current research has drawn the relationship among primary participant behaviors towards the participation level in knowledge sharing, basically from online user behaviors on Wikipedia, a representative community for online knowledge collaboration. Without users' participation in knowledge sharing, knowledge collaboration for creating knowledge cannot be successful. By the way, the editing patterns of Wikipedia users are diverse, resulting in different revisiting periods for the same number of edits, and thus varying results of shared knowledge. Therefore, we illuminated the participation level of knowledge sharing from two different angles of number of edits and revisiting period. The behavioral dimensions affecting the level of participation in knowledge sharing includes the article talk for public discussion and user talk for private messaging, and community registration, which are observable on Wiki platform. Public discussion is being progressed on article talk pages arranged for exchanging ideas about each article topic. An article talk page is often divided into several sections which mainly address specific type of issues raised during the article development procedure. From the diverse opinions about the relatively trivial things such as what text, link, or images should be added or removed and how they should be restructured to the profound professional insights are shared, negotiated, and improved over the course of discussion. Wikipedia also provides personal user talk pages as a private messaging tool. On these pages, diverse personal messages such as casual greetings, stories about activities on Wikipedia, and ordinary affairs of life are exchanged. If anyone wants to communicate with another person, he or she visits the person's user talk page and leaves a message. Wikipedia articles are assessed according to seven quality grades, of which the featured article level is the highest. The dataset includes participants' behavioral data related with 2,978 articles, which have reached the featured article level, with editing histories of articles, their article talk histories, and user talk histories extracted from user talk pages for each article. The time period for analysis is from the initiation of articles until their promotion to the featured article level. The number of edits represents the total number of participation in the editing of an article, and the revisiting period is the time difference between the first and last edits. At first, the participation levels of each user category classified according to behavioral dimensions have been analyzed and compared. And then, robust regressions have been conducted on the relationships among independent variables reflecting the degree of behavioral characteristics and the dependent variable representing the participation level. Especially, through adopting a motivational theory adequate for online environment in setting up research hypotheses, this work suggests a theoretical framework for the participation level of online knowledge sharing. Consequently, this work reached the following practical behavioral results besides some theoretical implications. First, both public discussion and private messaging positively affect the participation level in knowledge sharing. Second, public discussion exerts greater influence than private messaging on the participation level. Third, a synergy effect of public discussion and private messaging on the number of edits was found, whereas a pretty weak negative interaction effect of them on the revisiting period was observed. Fourth, community registration has a significant impact on the revisiting period, whereas being insignificant on the number of edits. Fifth, when it comes to the relation generated from private messaging, the frequency or depth of relation is shown to be more critical than the scope of relation for the participation level.