• Title/Summary/Keyword: 동시출현 단어

Search Result 127, Processing Time 0.032 seconds

Text Categorization Using TextRank Algorithm (TextRank 알고리즘을 이용한 문서 범주화)

  • Bae, Won-Sik;Cha, Jeong-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.1
    • /
    • pp.110-114
    • /
    • 2010
  • We describe a new method for text categorization using TextRank algorithm. Text categorization is a problem that over one pre-defined categories are assigned to a text document. TextRank algorithm is a graph-based ranking algorithm. If we consider that each word is a vertex, and co-occurrence of two adjacent words is a edge, we can get a graph from a document. After that, we find important words using TextRank algorithm from the graph and make feature which are pairs of words which are each important word and a word adjacent to the important word. We use classifiers: SVM, Na$\ddot{i}$ve Bayesian classifier, Maximum Entropy Model, and k-NN classifier. We use non-cross-posted version of 20 Newsgroups data set. In consequence, we had an improved performance in whole classifiers, and the result tells that is a possibility of TextRank algorithm in text categorization.

Descriptor Profiling for Research Domain Analysis (연구영역분석을 위한 디스크립터 프로파일링에 관한 연구)

  • Kim, Pan-Jun;Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.4
    • /
    • pp.285-303
    • /
    • 2007
  • This study aims to explore a new technique making complementary linkage between controlled vocabularies and uncontrolled vocabularies for analyzing a research domain. Co-word analysis can be largely divided into two based on the types of vocabulary used: controlled and uncontrolled. In the case of using controlled vocabulary, data sparseness and indexer effect are inherent drawbacks. On the other case, word selection by the author's perspective and word ambiguity. To complement each other, we suggest a descriptor profiling that represents descriptors(controlled vocabulary) as the co-occurrence with words from the text(uncontrolled vocabulary). Applying the profiling to the domain of information science implies that this method can complement each other by reducing the inherent shortcoming of the controlled and uncontrolled vocabulary.

An Expansion of Affective Image Access Points Based on Users' Response on Image (이용자 반응 기반 이미지 감정 접근점 확장에 관한 연구)

  • Chung, Eun Kyung
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.25 no.3
    • /
    • pp.101-118
    • /
    • 2014
  • Given the context of rapid developing ubiquitous computing environment, it is imperative for users to search and use images based on affective meanings. However, it has been difficult to index affective meanings of image since emotions of image are substantially subjective and highly abstract. In addition, utilizing low level features of image for indexing affective meanings of image has been limited for high level concepts of image. To facilitate the access points of affective meanings of image, this study aims to utilize user-provided responses of images. For a data set, emotional words are collected and cleaned from twenty participants with a set of fifteen images, three images for each of basic emotions, love, sad, fear, anger, and happy. A total of 399 unique emotion words are revealed and 1,093 times appeared in this data set. Through co-word analysis and network analysis of emotional words from users' responses, this study demonstrates expanded word sets for five basic emotions. The expanded word sets are characterized with adjective expression and action/behavior expression.

Analyzing Research Trends in Bioinformatics based on Comparison between Grey and White Bioinformatics Literatures (바이오인포매틱스 분야 회색문헌 및 백색문헌의 연구 동향 비교 분석)

  • Kim, Ye Eun;Kim, Jung Ju;Song, Min
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2013.08a
    • /
    • pp.11-14
    • /
    • 2013
  • 본 연구의 목적은 바이오인포매틱스 분야의 회색문헌과 백색문헌의 초록을 대상으로 단어 동시출현(word co-occurrence)네트워크 분석을 통해 해당 분야의 연구 동향을 비교 분석하고자 하였다. 이를 위해 2010년부터 2012년까지 발표된 회색문헌인 회의자료(proceeding)와 백색문헌인 학술논문(journal article)의 초록을 SCOPUS, IEEEXplore, Microsoft academic search에서 수집하였다. 단어 동시출현 네트워크를 분석한 결과 회색문헌의 주요 연구는 분석도구 및 방법으로, 백색문헌의 주요 연구는 바이오인포매틱스의 주요 연구대상인 유전자 발현, 단백질 서열 및 구조 등으로 나타났다.

  • PDF

A Study on the Characteristics by Keyword Types in the Intellectual Structure Analysis Based on Co-word Analysis: Focusing on Overseas Open Access Field (동시출현단어 분석에 기초한 지적구조 분석에서 키워드 유형별 특성에 관한 연구 - 국외 오픈액세스 분야를 중심으로 -)

  • Kim, Pan Jun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.55 no.3
    • /
    • pp.103-129
    • /
    • 2021
  • This study examined the characteristics of two keyword types expressing the topics in the intellectual structure analysis based on the co-word analysis, focused on overseas open access field. Specifically, the keyword set extracted from the LISTA database in the field of library and information science was divided into two types (controlled keywords and uncontrolled keywords), and the results of performing intellectual structure analysis based on co-word analysis were compared. As a result, the two keyword types showed significant differences by keyword sets, research maps and influences, and periods. Therefore, in intellectual structure analysis based on co-word analysis, the characteristics of each keyword type should be considered according to the purpose of the study. In other words, it would be more appropriate to use controlled keywords for the purpose of examining the overall research trend in a specific field from the perspective of the entire academic field, and to use uncontrolled keywords for the purpose of identifying detailed trends by research area from the perspective of the specific field. In addition, for a comprehensive intellectual structure analysis that reflects both viewpoints, it can be said that it is most desirable to compare and analyze the results of using controlled keywords and uncontrolled keywords individually.

Current Research Trends in Entrepreneurship Based on Topic Modeling and Keyword Co-occurrence Analysis: 2002~2021 (토픽모델링과 동시출현단어 분석을 이용한 기업가정신에 대한 연구동향 분석: 2002~2021)

  • Jang, Sung Hee
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.17 no.3
    • /
    • pp.245-256
    • /
    • 2022
  • The purpose of this study is to provide comprehensive insights on the current research trends in entrepreneurship based on topic modeling and keyword co-occurrence analysis. This study queried Web of Science database with 'entrepreneurship' and collected 14,953 research articles between 2002 and 2021. The study used R program for topic modeling and VOSviewer program for keyword co-occurrence analysis. The results of this study are as follows. First, as a result of keyword co-occurrence analysis, 5 clusters divided: entrepreneurship and innovation cluster, entrepreneurship education cluster, social entrepreneurship and sustainability cluster, enterprise performance cluster, and knowledge and technology transfer cluster. Second, as a result of the topic modeling analysis, 12 topics found: start-up environment and economic development, international entrepreneurship, venture capital, government policy and support, social entrepreneurship, management-related issues, regional city planning and development, entrepreneurship research, and entrepreneurial intention. Finally, the study identified two hot topics(venture capital and entrepreneurship intention) and a cold topic(international entrepreneurship). The results of this study are useful to understand current research trends in entrepreneurship research and provide insights into research of entrepreneurship.

A Study on the Retrieval Effectiveness of KoreaMed using MeSH Search Filter and Word-Proximity Search (검색용 MeSH 필터와 단어인접탐색 기법을 활용한 KoreaMed 검색 효율성 향상 연구)

  • Jeong, So-Na;Jeong, Ji-Na
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.5
    • /
    • pp.596-607
    • /
    • 2017
  • This study examined the method for adding related to "stomach neoplasms" as filters to the Medical Subject Headings (MeSH) for search as well as a method for improving the search efficiency through a word-proximity search by measuring the distance of co-occurring terms. A total of 8,625 articles published between 2007 and 2016 with the major topic terms "stomach neoplasms" were downloaded from PubMed article titles. The vocabulary to be added to the MeSH for search were analyzed. The search efficiency was verified by 277 articles that had "Stomach Neoplasms" indexed as MEDLINE MeSH in KoreaMed. As a result, 973 terms were selected as the candidate vocabulary. "Gastric Cancer" (2,780 appearances) was the most frequent term and 7,376 compound words (88.51%) combined the histological terms of "stomach" and "neoplasm", such as "gastric adenocarcinoma" and "gastric MALT lymphoma". A total of 5,234 compounds words (70.95%), in which the co-occurring distance was two words, were found. The matching rate through the MEDLINE MeSH and KoreaMed MeSH Indexer was 209 articles (75.5%). The search efficiency improved to 263 articles (94.9%) when the search filters were added, and to 268 articles (96.7%) when the 13 word-proximity search technique of the co-occurring terms was applied. This study showed that the use of a thesaurus as a means of improving the search efficiency in a natural language search could maintain the advantages of controlled vocabulary. The search accuracy can be improved using the word-proximity search instead of a Boolean search.

Research trends in the field of multicultural education Network analysis:Focusing on Time series analysis of Co-word (다문화교육 분야의 연구동향에 대한 네트워크 분석: 동시출현단어의 시계열 분석중심으로)

  • Bae, Kyungim
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.10
    • /
    • pp.159-170
    • /
    • 2021
  • The purpose of this study was to understand the knowledge structure through keyword network analysis for the purpose of identifying research trends in the research field of multicultural education. To this end, the research trends and intellectual structure of multicultural education were identified through network analysis of words that appeared more than 6 times in the keywords of the papers registered in the KCI (Korean Journal of Citation Index) from 2002 to 2020. Study changes were analyzed by analysis. As a result of the analysis, the first period (2002-2010) focused on multicultural society and multiculturalism, while the second period (2011-2015) additionally introduced multicultural families, globalization, and teacher education, and the third period (2016-2020), multicultural receptivity, multicultural sensitivity, and multicultural efficacy were newly revealed. The research trend of multicultural education in Korean society over the past 19 years has been confirmed that the research topic has changed from theoretical research to empirical research, and the content of multicultural education has also been specified and expanded by field and subject.

Coward Analysis based Spam SMS Detection Scheme (동시출현 단어분석 기반 스팸 문자 탐지 기법)

  • Oh, Hayoung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.3
    • /
    • pp.693-700
    • /
    • 2016
  • Analyzing characteristics of spam text messages had limitations since spam datasets are typically difficult to obtain publicly and previous studies focused on spam email. Although existing studies, such as through the use of spam e-mail characterization and utilization of data mining techniques, there are limitations that influence is limited to high spam detection techniques using a single word character. In this paper, we reveal the characteristics of the spam SMS based on experiment and analysis from different perspectives and propose coward analysis based spam SMS detection scheme with a publicly disclosed spam SMS from the University of Singapore. With the extensive performance evaluations, we show false positive and false negative of the proposed method is less than 2%.

Examining the Intellectual Structure of a Medical Informatics Journal with Author Co-citation Analysis and Co-word Analysis (저자동시인용 분석과 동시출현단어 분석을 이용한 의료정보학 저널의 지적구조 분석)

  • Heo, Go Eun;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.30 no.2
    • /
    • pp.207-225
    • /
    • 2013
  • Due to the development of science and technology, the convergence of various disciplines has been fostered. Accordingly, interdisciplinary studies have increasingly been expanded by integrating knowledge and methodology from different disciplines. The primary focus of biblimetric methods is on investigating the intellectual structure a field, and analysis of the characterization of interdisciplinary studies is overlooked. In this study, we aim to identify the intellectual structure of the field of medical informatics through author co-citation analysis and co-word analysis by the representative journal "IEEE ENG MED BIOL." In addition, we examine authors and MeSH Terms of top three representative journals for further analysis of the field. We examine the intellectual structure of the medical informatics field by author and word clusters to identify the network structure of medical informatics disciplines.