• Title/Summary/Keyword: Similar Keyword

Search Result 114, Processing Time 0.026 seconds

A Semi-Automatic Semantic Mark Tagging System for Building Dialogue Corpus (대화 말뭉치 구축을 위한 반자동 의미표지 태깅 시스템)

  • Park, Junhyeok;Lee, Songwook;Lim, Yoonseob;Choi, Jongsuk
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.5
    • /
    • pp.213-222
    • /
    • 2019
  • Determining the meaning of a keyword in a speech dialogue system is an important technology for the future implementation of an intelligent speech dialogue interface. After extracting keywords to grasp intention from user's utterance, the intention of utterance is determined by using the semantic mark of keyword. One keyword can have several semantic marks, and we regard the task of attaching the correct semantic mark to the user's intentions on these keyword as a problem of word sense disambiguation. In this study, about 23% of all keywords in the corpus is manually tagged to build a semantic mark dictionary, a synonym dictionary, and a context vector dictionary, and then the remaining 77% of all keywords is automatically tagged. The semantic mark of a keyword is determined by calculating the context vector similarity from the context vector dictionary. For an unregistered keyword, the semantic mark of the most similar keyword is attached using a synonym dictionary. We compare the performance of the system with manually constructed training set and semi-automatically expanded training set by selecting 3 high-frequency keywords and 3 low-frequency keywords in the corpus. In experiments, we obtained accuracy of 54.4% with manually constructed training set and 50.0% with semi-automatically expanded training set.

An Expert System for Content-based Image Retrieval with Object Database (객체 데이터베이스를 이용한 내용기반 이미지 검색 전문가 시스템)

  • Kim, Young-Min;Kim, Seong-In
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.14 no.5
    • /
    • pp.473-482
    • /
    • 2008
  • In this paper we propose an expert system for content-based image retrieval with object database. The proposed system finds keyword by using knowledge-base and feature of extracted object, and retrieves image by using keyword based image retrieval method. The system can decrease error of image retrieval and save running time. The system also checks whether similar objects exist or not. If not, user can store information of object in object database. Proposed system is flexible and extensible, enabling experts to incrementally add more knowledge and information. Experimental results show that the proposed system is more effective than existing content-based image retrieval method in running time and precision.

Analysis of Massive Scholarly Keywords using Inverted-Index based Bottom-up Clustering (역인덱스 기반 상향식 군집화 기법을 이용한 대규모 학술 핵심어 분석)

  • Oh, Heung-Seon;Jung, Yuchul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.11
    • /
    • pp.758-764
    • /
    • 2018
  • Digital documents such as patents, scholarly papers and research reports have author keywords which summarize the topics of documents. Different documents are likely to describe the same topic if they share the same keywords. Document clustering aims at clustering documents to similar topics with an unsupervised learning method. However, it is difficult to apply to a large amount of documents event though the document clustering is utilized to in various data analysis due to computational complexity. In this case, we can cluster and connect massive documents using keywords efficiently. Existing bottom-up hierarchical clustering requires huge computation and time complexity for clustering a large number of keywords. This paper proposes an inverted index based bottom-up clustering for keywords and analyzes the results of clustering with massive keywords extracted from scholarly papers and research reports.

A Study on the Analysis of Agricultural R&D Keywords Using Textmining Method (텍스트마이닝을 활용한 농업 R&D 키워드 분석)

  • Kim, Ji-Hoon;Kim, Seong-Sup
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.2
    • /
    • pp.721-732
    • /
    • 2021
  • This study analyzed keywords for agricultural R&D using the textmining method to examine the trend of agricultural R&D. Data used for the analysis included R&D project information provided by NTIS, and the research and development step by year from 2003 to 2018 were classified and applied. The TF-IDF approach was used as the analysis method, and ranking was derived based on score. Furthermore, we analyzed by grouping for similar keywords. The main analysis results are as follows. First, agricultural R&D trends are changing according to the introduction of new technologies and changes in the external environment. Second, keyword changes appeared with a time lag in the R&D step. The main keywords are changing in the order of basic research - applied research - development research. Third, the main keyword of agricultural R&D was 'rice.' However, the direction and purpose of the research were changing according to changes in the domestic and foreign agricultural environments.

Topic Modeling and Keyword Network Analysis of News Articles Related to Nurses before and after "the Thanks to You Challenge" during the COVID-19 Pandemic (COVID-19 '덕분에 챌린지' 전후 간호사 관련 뉴스 기사의 토픽 모델링 및 키워드 네트워크 분석)

  • Yun, Eun Kyoung;Kim, Jung Ok;Byun, Hye Min;Lee, Guk Geun
    • Journal of Korean Academy of Nursing
    • /
    • v.51 no.4
    • /
    • pp.442-453
    • /
    • 2021
  • Purpose: This study was conducted to assess public awareness and policy challenges faced by practicing nurses. Methods: After collecting nurse-related news articles published before and after 'the Thanks to You Challenge' campaign (between December 31, 2019, and July 15, 2020), keywords were extracted via preprocessing. A three-step method keyword analysis, latent Dirichlet allocation topic modeling, and keyword network analysis was used to examine the text and the structure of the selected news articles. Results: Top 30 keywords with similar occurrences were collected before and after the campaign. The five dominant topics before the campaign were: pandemic, infection of medical staff, local transmission, medical resources, and return of overseas Koreans. After the campaign, the topics 'infection of medical staff' and 'return of overseas Koreans' disappeared, but 'the Thanks to You Challenge' emerged as a dominant topic. A keyword network analysis revealed that the word of nurse was linked with keywords like thanks and campaign, through the word of sacrifice. These words formed interrelated domains of 'the Thanks to You Challenge' topic. Conclusion: The findings of this study can provide useful information for understanding various issues and social perspectives on COVID-19 nursing. The major themes of news reports lagged behind the real problems faced by nurses in COVID-19 crisis. While the press tends to focus on heroism and whole society, issues and policies mutually beneficial to public and nursing need to be further explored and enhanced by nurses.

Recommending Core and Connecting Keywords of Research Area Using Social Network and Data Mining Techniques (소셜 네트워크와 데이터 마이닝 기법을 활용한 학문 분야 중심 및 융합 키워드 추천 서비스)

  • Cho, In-Dong;Kim, Nam-Gyu
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.127-138
    • /
    • 2011
  • The core service of most research portal sites is providing relevant research papers to various researchers that match their research interests. This kind of service may only be effective and easy to use when a user can provide correct and concrete information about a paper such as the title, authors, and keywords. However, unfortunately, most users of this service are not acquainted with concrete bibliographic information. It implies that most users inevitably experience repeated trial and error attempts of keyword-based search. Especially, retrieving a relevant research paper is more difficult when a user is novice in the research domain and does not know appropriate keywords. In this case, a user should perform iterative searches as follows : i) perform an initial search with an arbitrary keyword, ii) acquire related keywords from the retrieved papers, and iii) perform another search again with the acquired keywords. This usage pattern implies that the level of service quality and user satisfaction of a portal site are strongly affected by the level of keyword management and searching mechanism. To overcome this kind of inefficiency, some leading research portal sites adopt the association rule mining-based keyword recommendation service that is similar to the product recommendation of online shopping malls. However, keyword recommendation only based on association analysis has limitation that it can show only a simple and direct relationship between two keywords. In other words, the association analysis itself is unable to present the complex relationships among many keywords in some adjacent research areas. To overcome this limitation, we propose the hybrid approach for establishing association network among keywords used in research papers. The keyword association network can be established by the following phases : i) a set of keywords specified in a certain paper are regarded as co-purchased items, ii) perform association analysis for the keywords and extract frequent patterns of keywords that satisfy predefined thresholds of confidence, support, and lift, and iii) schematize the frequent keyword patterns as a network to show the core keywords of each research area and connecting keywords among two or more research areas. To estimate the practical application of our approach, we performed a simple experiment with 600 keywords. The keywords are extracted from 131 research papers published in five prominent Korean journals in 2009. In the experiment, we used the SAS Enterprise Miner for association analysis and the R software for social network analysis. As the final outcome, we presented a network diagram and a cluster dendrogram for the keyword association network. We summarized the results in Section 4 of this paper. The main contribution of our proposed approach can be found in the following aspects : i) the keyword network can provide an initial roadmap of a research area to researchers who are novice in the domain, ii) a researcher can grasp the distribution of many keywords neighboring to a certain keyword, and iii) researchers can get some idea for converging different research areas by observing connecting keywords in the keyword association network. Further studies should include the following. First, the current version of our approach does not implement a standard meta-dictionary. For practical use, homonyms, synonyms, and multilingual problems should be resolved with a standard meta-dictionary. Additionally, more clear guidelines for clustering research areas and defining core and connecting keywords should be provided. Finally, intensive experiments not only on Korean research papers but also on international papers should be performed in further studies.

A Study on the Development of Search Algorithm for Identifying the Similar and Redundant Research (유사과제파악을 위한 검색 알고리즘의 개발에 관한 연구)

  • Park, Dong-Jin;Choi, Ki-Seok;Lee, Myung-Sun;Lee, Sang-Tae
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.11
    • /
    • pp.54-62
    • /
    • 2009
  • To avoid the redundant investment on the project selection process, it is necessary to check whether the submitted research topics have been proposed or carried out at other institutions before. This is possible through the search engines adopted by the keyword matching algorithm which is based on boolean techniques in national-sized research results database. Even though the accuracy and speed of information retrieval have been improved, they still have fundamental limits caused by keyword matching. This paper examines implemented TFIDF-based algorithm, and shows an experiment in search engine to retrieve and give the order of priority for similar and redundant documents compared with research proposals, In addition to generic TFIDF algorithm, feature weighting and K-Nearest Neighbors classification methods are implemented in this algorithm. The documents are extracted from NDSL(National Digital Science Library) web directory service to test the algorithm.

Retrieval Framework for Enterprise Information Integration based on Concept Net in Cloud Environment (클라우드 환경에서 전사적 정보 연계를 위한 개념 망 기반의 검색 프레임워크)

  • Jung, Kye-Dong;Moon, Seok-Jae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.2
    • /
    • pp.453-460
    • /
    • 2013
  • This study proposes a framework that enables efficient integration and usage of enterprise data using semantic based concept net. Integration of enterprise information that has been increasing geometrically in cloud environment. The concept net is very similar in approaching way to existing ontology. However, it builds correlation between object and concept to help user's information integration retrieval more efficiently. In this study, concept nets are divided into 3 kinds and are applied to the proposed framework independently. The concept net in this study is built in ontology format based on master information concept net, keyword concept net and business process concept net. This concept net enables retrieval and usage of data based on correlation among data according to user's request. Then, through combination of master information concept and keyword concept, it provides frequency trace of keyword and category thus improving convenience and speed of retrieval.

Keyword Analysis of Two SCI Journals on Rock Engineering by using Text Mining (텍스트 마이닝을 이용한 암반공학분야 SCI논문의 주제어 분석)

  • Jung, Yong-Bok;Park, Eui-Seob
    • Tunnel and Underground Space
    • /
    • v.25 no.4
    • /
    • pp.303-319
    • /
    • 2015
  • Text mining is one of the branches of data mining and is used to find any meaningful information from the large amount of text. In this study, we analyzed titles and keywords of two SCI journals on rock engineering by using text mining to find major research area, trend and associations of research fields. Visualization of the results was also included for the intuitive understanding of the results. Two journals showed similar research fields but different patterns in the associations among research fields. IJRMMS showed simple network, that is one big group based on the keyword 'rock' with a few small groups. On the other hand, RMRE showed a complex network among various medium groups. Trend analysis by clustering and linear regression of keyword - year frequency matrix provided that most of the keywords increased in number as time goes by except a few descending keywords.

Effective Keyword Search on Semantic RDF Data (시맨틱 RDF 데이터에 대한 효과적인 키워드 검색)

  • Park, Chang-Sup
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.11
    • /
    • pp.209-220
    • /
    • 2017
  • As a semantic data is widely used in various applications such as Knowledge Bases and Semantic Web, needs for effective search over a large amount of RDF data have been increasing. Previous keyword search methods based on distinct root semantics only retrieve a set of answer trees having different root nodes. Thus, they often find answer trees with similar meanings or low query relevance together while those with the same root node cannot be retrieved together even if they have different meanings and high query relevance. We propose a new method to find diverse and relevant answers to the query by permitting duplication of root nodes among them. We present an efficient query processing algorithm using path indexes to find top-k answers given a maximum amount of root duplication a set of answer trees can have. We show by experiments using a real dataset that the proposed approach can produce effective answer trees which are less redundant in their content nodes and more relevant to the query than the previous method.