• Title/Summary/Keyword: Document Clustering

Search Result 224, Processing Time 0.023 seconds

Review of Wind Energy Publications in Korea Citation Index using Latent Dirichlet Allocation (잠재디리클레할당을 이용한 한국학술지인용색인의 풍력에너지 문헌검토)

  • Kim, Hyun-Goo;Lee, Jehyun;Oh, Myeongchan
    • New & Renewable Energy
    • /
    • v.16 no.4
    • /
    • pp.33-40
    • /
    • 2020
  • The research topics of more than 1,900 wind energy papers registered in the Korean Journal Citation Index (KCI) were modeled into 25 topics using latent directory allocation (LDA), and their consistency was cross-validated through principal component analysis (PCA) of the document word matrix. Key research topics in the wind energy field were identified as "offshore, wind farm," "blade, design," "generator, voltage, control," 'dynamic, load, noise," and "performance test." As a new method to determine the similarity between research topics in journals, a systematic evaluation method was proposed to analyze the correlation between topics by constructing a journal-topic matrix (JTM) and clustering them based on topic similarity between journals. By evaluating 24 journals that published more than 20 wind energy papers, it was confirmed that they were classified into meaningful clusters of mechanical engineering, electrical engineering, marine engineering, and renewable energy. It is expected that the proposed systematic method can be applied to the evaluation of the specificity of subsequent journals.

A Dynamic Recommendation System Using User Log Analysis and Document Similarity in Clusters (사용자 로그 분석과 클러스터 내의 문서 유사도를 이용한 동적 추천 시스템)

  • 김진수;김태용;최준혁;임기욱;이정현
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.5
    • /
    • pp.586-594
    • /
    • 2004
  • Because web documents become creation and disappearance rapidly, users require the recommend system that offers users to browse the web document conveniently and correctly. One largely untapped source of knowledge about large data collections is contained in the cumulative experiences of individuals finding useful information in the collection. Recommendation systems attempt to extract such useful information by capturing and mining one or more measures of the usefulness of the data. The existing Information Filtering system has the shortcoming that it must have user's profile. And Collaborative Filtering system has the shortcoming that users have to rate each web document first and in high-quantity, low-quality environments, users may cover only a tiny percentage of documents available. And dynamic recommendation system using the user browsing pattern also provides users with unrelated web documents. This paper classifies these web documents using the similarity between the web documents under the web document type and extracts the user browsing sequential pattern DB using the users' session information based on the web server log file. When user approaches the web document, the proposed Dynamic recommendation system recommends Top N-associated web documents set that has high similarity between current web document and other web documents and recommends set that has sequential specificity using the extracted informations and users' session information.

Question and Answering System through Search Result Summarization of Q&A Documents (Q&A 문서의 검색 결과 요약을 활용한 질의응답 시스템)

  • Yoo, Dong Hyun;Lee, Hyun Ah
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.4
    • /
    • pp.149-154
    • /
    • 2014
  • A user should pick up relevant answers by himself from various search results when using user participation question answering community like Knowledge-iN. If refined answers are automatically provided, usability of question answering community must be improved. This paper divides questions in Q&A documents into 4 types(word, list, graph and text), then proposes summarizing methods for each question type using document statistics. Summarized answers for word, list and text type are obtained by question clustering and calculating scores for words using frequency, proximity and confidence of answers. Answers for graph type is shown by extracting user opinion from answers.

Analysis of Term Ambiguity based on Genetic Algorithm (유전자 알고리즘 기반 용어 중의성 분석)

  • Kim, Jeong-Joon;Chung, Sung-Taek;Park, Jeong-Min
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.5
    • /
    • pp.131-136
    • /
    • 2017
  • Recently, with the development of Internet media, many document materials have become exponentially increasing on the web. These materials are described, and the information on what is the most by this text are classified according. However, the text has meant that many have room for ambiguous interpretation must look at it from various angles in order to interpret them correctly. In conventional classification methods it was simply a classification only have the appearance of the text. In this paper, we analyze it in terms genetic algorithm and local preserving based techniques and implemented a clustering system fragmentation them. Finally, the performance of this paper was evaluated based on the implementation results compared to traditional methods.

An Analysis of Indications of Meridians in DongUiBoGam Using Data Mining (데이터마이닝을 이용한 동의보감에서 경락의 주치특성 분석)

  • Chae, Younbyoung;Ryu, Yeonhee;Jung, Won-Mo
    • Korean Journal of Acupuncture
    • /
    • v.36 no.4
    • /
    • pp.292-299
    • /
    • 2019
  • Objectives : DongUiBoGam is one of the representative medical literatures in Korea. We used text mining methods and analyzed the characteristics of the indications of each meridian in the second chapter of DongUiBoGam, WaeHyeong, which addresses external body elements. We also visualized the relationships between the meridians and the disease sites. Methods : Using the term frequency-inverse document frequency (TF-IDF) method, we quantified values regarding the indications of each meridian according to the frequency of the occurrences of 14 meridians and 14 disease sites. The spatial patterns of the indications of each meridian were visualized on a human body template according to the TF-IDF values. Using hierarchical clustering methods, twelve meridians were clustered into four groups based on the TF-IDF distributions of each meridian. Results : TF-IDF values of each meridian showed different constellation patterns at different disease sites. The spatial patterns of the indications of each meridian were similar to the route of the corresponding meridian. Conclusions : The present study identified spatial patterns between meridians and disease sites. These findings suggest that the constellations of the indications of meridians are primarily associated with the lines of the meridian system. We strongly believe that these findings will further the current understanding of indications of acupoints and meridians.

Similarity Measure and Clustering Technique for XML Documents by a Parent-Child Matrix (부모-자식 행렬을 사용한 XML 문서 유사도 측정과 군집 기법)

  • Lee, Yun-Gu;Kim, Woosaeng
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.7
    • /
    • pp.1599-1607
    • /
    • 2015
  • Recently, researches have been developing efficient techniques for accessing, querying, and managing XML documents which are frequently used in the Internet. In this paper, we propose a parent-child matrix to cluster XML documents efficiently. A parent-child matrix analyzes both the content and structural features of an XML document. Each cell of a parent-child matrix has either the value of a node in an XML tree or the value of a child node, where a parent-child relationship exists in the XML tree. Then, the similarity between two XML documents can be measured by the similarity between two corresponding parent-child matrices. The experiment shows that our proposed method has good performance.

Information Technology Application for Oral Document Analysis (구술문서 자료분석을 위한 정보검색기술의 응용)

  • Park, Soon-Cheol;Hahm, Han-Hee
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.13 no.2
    • /
    • pp.47-55
    • /
    • 2008
  • The purpose of this paper is to develop an analytical methodology of or릴 documents by the application of. Information Technologies. This system consists of the key word search, contents summary, clustering, classification & topic tracing of the contents. The integrated model of the five levels of retrieval technologies can be exhaustively used in the analysis of oral documents, which were collected as oral history of five men and women in the area of North Jeolla. Of the five methods topic tracing is the most pioneering accomplishment both home and abroad. In final this research will shed light on the methodological and theoretical studies of oral history and culture.

  • PDF

An Optimized e-Lecture Video Search and Indexing framework

  • Medida, Lakshmi Haritha;Ramani, Kasarapu
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.87-96
    • /
    • 2021
  • The demand for e-learning through video lectures is rapidly increasing due to its diverse advantages over the traditional learning methods. This led to massive volumes of web-based lecture videos. Indexing and retrieval of a lecture video or a lecture video topic has thus proved to be an exceptionally challenging problem. Many techniques listed by literature were either visual or audio based, but not both. Since the effects of both the visual and audio components are equally important for the content-based indexing and retrieval, the current work is focused on both these components. A framework for automatic topic-based indexing and search depending on the innate content of the lecture videos is presented. The text from the slides is extracted using the proposed Merged Bounding Box (MBB) text detector. The audio component text extraction is done using Google Speech Recognition (GSR) technology. This hybrid approach generates the indexing keywords from the merged transcripts of both the video and audio component extractors. The search within the indexed documents is optimized based on the Naïve Bayes (NB) Classification and K-Means Clustering models. This optimized search retrieves results by searching only the relevant document cluster in the predefined categories and not the whole lecture video corpus. The work is carried out on the dataset generated by assigning categories to the lecture video transcripts gathered from e-learning portals. The performance of search is assessed based on the accuracy and time taken. Further the improved accuracy of the proposed indexing technique is compared with the accepted chain indexing technique.

Study on CEO New Year's Address: Using Text Mining Method (텍스트마이닝을 활용한 주요 대기업 신년사 분석)

  • YuKyoung Kim;Daegon Cho
    • Journal of Information Technology Services
    • /
    • v.22 no.2
    • /
    • pp.93-127
    • /
    • 2023
  • This study analyzed the CEO New Year's addresses of major Korean companies, extracting key topics for employees via text mining techniques. An intended contribution of this study is to assist reporters, analysts, and researchers in gaining a better understanding of the New Year's addresses by elucidating the implicit and implicative features of messages within. To this end, this study collected and analyzed 545 New Year's addresses published between 2012 and 2021 by the top 66 Korean companies in terms of market capitalization. Research methodologies applied include text clustering, word embedding of keywords, frequency analysis, and topic modeling. Our main findings suggest that the messages in the New Year's addresses were categorized into nine topics-organizational culture, global advancement, substantial management, business reorganization, capacity building, market leadership, management innovation, sustainable management, and technology development. Next, this study further analyzed the managerial significance of each topic and discussed their characteristics from the perspectives of time, industry, and corporate groups. Companies were typically found to emphasize sound management, market leadership, and business reorganization during economic downturns while stressing capacity building and organizational culture during market transition periods. Also, companies belonging to corporate groups tended to emphasize founding philosophy and corporate culture.

Text Mining Techniques for Adaptable Learning (적응적인 학습을 위한 텍스트 마이닝 기술)

  • Kim, Cheon-Shik;Jung, Myung-Hee;Hong, You-Sik
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.3
    • /
    • pp.31-39
    • /
    • 2008
  • Until now, there are many technologies to improve studying ability using e-learning system. In most of e-learning system, learners are studying through the lecture materials and studying problems. The studying ability and intention, however, can be improved through the shared materials and discussion. In this case, learning materials are shared by the learners' discussion and shared materials through the board Internet and MSN. Such data was not classified by learners; it was not easy for the learners to search related valuable information. Therefore, it was not helping to learning. The technologies of most text mining extract summary data from the collection of document or classify into similar document from the complex document. In this paper, we implemented e-learning system for learners to improve learning abilities and especially, applied text mining technology to classify learning material for helping learners.