• Title/Summary/Keyword: Extract Keywords

Search Result 126, Processing Time 0.023 seconds

Recommending Core and Connecting Keywords of Research Area Using Social Network and Data Mining Techniques (소셜 네트워크와 데이터 마이닝 기법을 활용한 학문 분야 중심 및 융합 키워드 추천 서비스)

  • Cho, In-Dong;Kim, Nam-Gyu
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.127-138
    • /
    • 2011
  • The core service of most research portal sites is providing relevant research papers to various researchers that match their research interests. This kind of service may only be effective and easy to use when a user can provide correct and concrete information about a paper such as the title, authors, and keywords. However, unfortunately, most users of this service are not acquainted with concrete bibliographic information. It implies that most users inevitably experience repeated trial and error attempts of keyword-based search. Especially, retrieving a relevant research paper is more difficult when a user is novice in the research domain and does not know appropriate keywords. In this case, a user should perform iterative searches as follows : i) perform an initial search with an arbitrary keyword, ii) acquire related keywords from the retrieved papers, and iii) perform another search again with the acquired keywords. This usage pattern implies that the level of service quality and user satisfaction of a portal site are strongly affected by the level of keyword management and searching mechanism. To overcome this kind of inefficiency, some leading research portal sites adopt the association rule mining-based keyword recommendation service that is similar to the product recommendation of online shopping malls. However, keyword recommendation only based on association analysis has limitation that it can show only a simple and direct relationship between two keywords. In other words, the association analysis itself is unable to present the complex relationships among many keywords in some adjacent research areas. To overcome this limitation, we propose the hybrid approach for establishing association network among keywords used in research papers. The keyword association network can be established by the following phases : i) a set of keywords specified in a certain paper are regarded as co-purchased items, ii) perform association analysis for the keywords and extract frequent patterns of keywords that satisfy predefined thresholds of confidence, support, and lift, and iii) schematize the frequent keyword patterns as a network to show the core keywords of each research area and connecting keywords among two or more research areas. To estimate the practical application of our approach, we performed a simple experiment with 600 keywords. The keywords are extracted from 131 research papers published in five prominent Korean journals in 2009. In the experiment, we used the SAS Enterprise Miner for association analysis and the R software for social network analysis. As the final outcome, we presented a network diagram and a cluster dendrogram for the keyword association network. We summarized the results in Section 4 of this paper. The main contribution of our proposed approach can be found in the following aspects : i) the keyword network can provide an initial roadmap of a research area to researchers who are novice in the domain, ii) a researcher can grasp the distribution of many keywords neighboring to a certain keyword, and iii) researchers can get some idea for converging different research areas by observing connecting keywords in the keyword association network. Further studies should include the following. First, the current version of our approach does not implement a standard meta-dictionary. For practical use, homonyms, synonyms, and multilingual problems should be resolved with a standard meta-dictionary. Additionally, more clear guidelines for clustering research areas and defining core and connecting keywords should be provided. Finally, intensive experiments not only on Korean research papers but also on international papers should be performed in further studies.

A Keyword Network Analysis on Research Trends in the Area of Health Insurance (건강보험 연구동향에 대한 키워드 네트워크 분석)

  • Lee, Su Jung;Lee, Sun-Hee
    • Health Policy and Management
    • /
    • v.31 no.3
    • /
    • pp.335-343
    • /
    • 2021
  • Background: The purpose of this study was to extract the major areas of interest in health insurance research in Korea, and infer policy agendas related to health insurance by analyzing research keywords. Methods: For this study, 2,590 articles were selected from among 7,459 academic papers related to health insurance published between January 1987 and December 2018, which were looked up using the Research Information Sharing Service (RISS). Keyword extraction and keyword network analysis were performed using the KrKwic, KrTitle, and UCINET software. Results: First, the number of studies in the area of health insurance continued to increase in all government terms, and it was not until after the 2000s that the subjects of health insurance researches were diversified. Second, degree centrality showed that 'medical expenditure' and 'medical utilization' were consistently high-ranking keywords regardless of the government in power. Aging and long-term care insurance-related keywords were ranked higher in the Lee Myung-bak government, Park Geun-hye government, and Moon Jae-in government. Third, betweenness centrality showed the same high ranking in key topics such as medical expenditure and medical utilization, while the ranking of key keywords differed depending on the interests and characteristics of each government policy. Conclusion: We confirm that health insurance as a research topic has been the main theme in Korean health care research fields. Research keywords extracted from articles also corresponded to the main health policies promoted during each government period. Efforts to systematically investigate policy megatrends are needed to plan adaptive future policies.

Analysis of OpinionMining on Consumer Satisfaction of InternetBanks: Focusing on the app review (인터넷전문은행의 소비자 만족에 관한 오피니언 마이닝 분석: 앱 사용 후기 중심으로)

  • Lee, Jong Hwa;Lee, Hyun Kyu
    • The Journal of Information Systems
    • /
    • v.32 no.3
    • /
    • pp.151-164
    • /
    • 2023
  • Purpose This study aims to analyze the current status of consumer awareness on Internet banks by conducting a full investigation and collecting user opinions presented on Google Play. After cateogorizing the current dissatisfaction, we would like to present not only the direction of the Internet bank service of but also the improvements of the platform. Design/methodology/approach Using opinion mining, subjectivity analysis, polarity analysis, and polarity information analysis of comments were conducted step by step to extract negative and positive keywords. The extracted keywords analyzed the weights of the frequently appearing positive and negative keywords using the TF-IDF model. Based on previous studies that negative information is more sensitive to positive information, we tried to confirm the connection, proximity, and mediation of negative keywords. Semantic Network Analysis (SNA) was used to visualize the connection relationship between the negative comment keywords of the three Internet banks. Findings Domestic Internet banks such as Kakao Bank, K-Bank, and Toss Bank have attracted a lot of attention even before they were established, and after establishment, they have secured a wide range of users through platforms that are completely different from existing banks. This study found out that the convenience of the app affects the opening and transaction of non-face-to-face accounts, which are characteristics of domestic Internet banks, which also affects the bank's business strategy. In addition, this study shows that the business characteristics of the company can be identified.

Representative Keyword Extraction from Few Documents through Fuzzy Inference (퍼지추론을 이용한 소수 문서의 대표 키워드 추출)

  • 노순억;김병만;허남철
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.9
    • /
    • pp.837-843
    • /
    • 2001
  • In this work, we propose a new method of extracting and weighting representative keywords(RKs) from a few documents that might interest a user. In order to extract RKs, we first extract candidate terms and them choose a number of terms called initial representative keywords (IRKs) from them through fuzzy inference. Then, by expanding and reweighting IRKs using term co-occurrence similarity, the final RKs are obtained. Performance of our approach is heavily influenced by effectiveness of selection method of IRKs so that we choose fuzzy inference because it is more effective in handling the uncertainty inherent in selecting representative keywords of documents. The problem addressed in this paper can be viewed as the one of calculating center of document vectors. So, to show the usefulness of our approach, we compare with two famous methods - Rocchio and Widrow-Hoff - on a number of documents collections. The result show that our approach outperforms the other approaches.

  • PDF

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

A bio-text mining system using keywords and patterns in a grid environment

  • Kwon, Hyuk-Ryul;Jung, Tae-Sung;Kim, Kyoung-Ran;Jahng, Hye-Kyoung;Cho, Wan-Sup;Yoo, Jae-Soo
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2007.02a
    • /
    • pp.48-52
    • /
    • 2007
  • As huge amount of literature including biological data is being generated after post genome era, it becomes difficult for researcher to find useful knowledge from the biological databases. Bio-text mining and related natural language processing technique are the key issues in the intelligent knowledge retrieval from the biological databases. We propose a bio-text mining technique for the biologists who find Knowledge from the huge literature. At first, web robot is used to extract and transform related literature from remote databases. To improve retrieval speed, we generate an inverted file for keywords in the literature. Then, text mining system is used for extracting given knowledge patterns and keywords. Finally, we construct a grid computing environment to guarantee processing speed in the text mining even for huge literature databases. In the real experiment for 10,000 bio-literatures, the system shows 95% precision and 98% recall.

  • PDF

A Study on Keyword Information Characteristics of Product Names for Online Sales of Women's Jeans Using Text Mining (텍스트마이닝을 활용한 온라인 판매 여성 청바지 상품명에 나타난 키워드의 정보 특성 분석)

  • Yeo Sun Kang
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.47 no.1
    • /
    • pp.35-51
    • /
    • 2023
  • This study used text mining to extract 2,842 keywords from 7,397 product names and organized them into categories in order to analyze the characteristics of keywords appearing in the product names of jeans after 2020. The item category included denim and Chungbaji [청바지], and Ilja [일자], while the silhouette category included wide and bootcut. In addition, high-waist and banding comprised the making sector, and the materials category consisted of napping, spandex, and soft blue. Denim surpassed the others in frequency, co-occurrence frequency, and centrality, and co-appeared with various other keywords. Also, the co-appearance of item and silhouette was prominent, and there were many keyword combinations that showed characteristics related to (a) high waist; (b) hemline detail; (c) rubber band; and (d) partial tearing. Furthermore, idiom expressions such as 'slim fit' and 'back tearing', which were not highlighted in the co-occurrence frequency, were additionally confirmed through correlation. Therefore, the product name analysis effectively identified the detailed characteristics of the silhouette and the making of jeans preferred by consumers.

Change in Sugar Composition of Ginseng Extract During Heat Treatment (인삼정의 추출 및 열처리 중 유리당의 함량변화)

  • 김해중;주현규
    • Journal of Ginseng Research
    • /
    • v.13 no.1
    • /
    • pp.56-59
    • /
    • 1989
  • The changes in free sugar composition were investigated with respect to the kinds of dried ginseng for extraction, the various ethanol concentrations used for ginseng extract manufacture and the conditions of heating temperature and time under which the ginseng extract was stored . The results are as follows: 1) The free sugar content of dried ginseng was 6.02-8.02% and the sucrose and maltose content in the free sugar was 70-80%. 2) The free sugar content was 13.82-26.29% in the Sanggunsam (dried ginseng of whole root) extract and it had a tendency to increase with increase in ethanol concentration. In addition, when a higher ethanol concentration was used, the sucrose content was in- creased but the maltose content was decreased. 3) The glucose, sucrose and maltose content in ginseng extract, decreased, in the order, as heating temperature and time were increased. On the other hand the opposite results were neted for xylose and fructose. Keywords Panax ginseng, ginseng extract, Sanggunsam.

  • PDF

Efficient Keyword Extraction from Social Big Data Based on Cohesion Scoring

  • Kim, Hyeon Gyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.10
    • /
    • pp.87-94
    • /
    • 2020
  • Social reviews such as SNS feeds and blog articles have been widely used to extract keywords reflecting opinions and complaints from users' perspective, and often include proper nouns or new words reflecting recent trends. In general, these words are not included in a dictionary, so conventional morphological analyzers may not detect and extract those words from the reviews properly. In addition, due to their high processing time, it is inadequate to provide analysis results in a timely manner. This paper presents a method for efficient keyword extraction from social reviews based on the notion of cohesion scoring. Cohesion scores can be calculated based on word frequencies, so keyword extraction can be performed without a dictionary when using it. On the other hand, their accuracy can be degraded when input data with poor spacing is given. Regarding this, an algorithm is presented which improves the existing cohesion scoring mechanism using the structure of a word tree. Our experiment results show that it took only 0.008 seconds to extract keywords from 1,000 reviews in the proposed method while resulting in 15.5% error ratio which is better than the existing morphological analyzers.

A Study on the Optimal Search Keyword Extraction and Retrieval Technique Generation Using Word Embedding (워드 임베딩(Word Embedding)을 활용한 최적의 키워드 추출 및 검색 방법 연구)

  • Jeong-In Lee;Jin-Hee Ahn;Kyung-Taek Koh;YoungSeok Kim
    • Journal of the Korean Geosynthetics Society
    • /
    • v.22 no.2
    • /
    • pp.47-54
    • /
    • 2023
  • In this paper, we propose the technique of optimal search keyword extraction and retrieval for news article classification. The proposed technique was verified as an example of identifying trends related to North Korean construction. A representative Korean media platform, BigKinds, was used to select sample articles and extract keywords. The extracted keywords were vectorized using word embedding and based on this, the similarity between the extracted keywords was examined through cosine similarity. In addition, words with a similarity of 0.5 or higher were clustered based on the top 10 frequencies. Each cluster was formed as 'OR' between keywords inside the cluster and 'AND' between clusters according to the search form of the BigKinds. As a result of the in-depth analysis, it was confirmed that meaningful articles appropriate for the original purpose were extracted. This paper is significant in that it is possible to classify news articles suitable for the user's specific purpose without modifying the existing classification system and search form.