• Title/Summary/Keyword: Topic Clustering.

Search Result 104, Processing Time 0.025 seconds

A study on research trends for gestational diabetes mellitus and breastfeeding: Focusing on text network analysis and topic modeling (임신성 당뇨와 모유수유에 대한 연구 동향 분석: 텍스트네트워크 분석과 토픽모델링 중심)

  • Lee, Junglim;Kim, Youngji;Kwak, Eunju;Park, Seungmi
    • The Journal of Korean Academic Society of Nursing Education
    • /
    • v.27 no.2
    • /
    • pp.175-185
    • /
    • 2021
  • Purpose: The aim of this study was to identify core keywords and topic groups in the 'Gestational diabetes mellitus (GDM) and Breastfeeding' field of research for better understanding research trends in the past 20 years. Methods: This was a text-mining and topic modeling study composed of four steps: 1) collecting abstracts, 2) extracting and cleaning semantic morphemes, 3) building a co-occurrence matrix, and 4) analyzing network features and clustering topic groups. Results: A total of 635 papers published between 2001 and 2020 were found in databases (Web of Science, CINAHL, RISS, DBPIA, RISS, KISS). Among them, 3,639 words extracted from 366 articles selected according to the conditions were analyzed by text network analysis and topic modeling. The most important keywords were 'exposure', 'fetus', 'hypoglycemia', 'prevention' and 'program'. Six topic groups were identified through topic modeling. The main topics of the study were 'cardiovascular disease' and 'obesity'. Through the topic modeling analysis, six themes were derived: 'cardiovascular disease', 'obesity', 'complication prevention strategy', 'support of breastfeeding', 'educational program' and 'management of GDM'. Conclusion: This study showed that over the past 20 years many studies have been conducted on complications such as cardiovascular diseases and obesity related to gestational diabetes and breastfeeding. In order to prevent complications of gestational diabetes and promote breastfeeding, various nursing interventions, including gestational diabetes management and educational programs for GDM pregnancies, should be developed in nursing fields.

Usability Analysis of Structured Abstracts in Journal Articles for Document Clustering (문서 클러스터링을 위한 학술지 논문의 구조적 초록 활용성 연구)

  • Choi, Sang-Hee;Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.1
    • /
    • pp.331-349
    • /
    • 2012
  • Structured abstracts have been regarded as an essential information factor to represent topics of journal articles. This study aims to provide an unconventional view to utilize structured abstracts with the analysis on sub fields of a structured abstract in depth. In this study, a structured abstract was segmented into four fields, namely, purpose, design, findings, and values/implications. Each field was compared in the performance analysis of document clustering. In result, the purpose statement of an abstract affected on the performance of journal article clustering more than any other fields. Furthermore, certain types of keywords were identified to be excluded in the document clustering to improve clustering performance, especially by Within group average clustering method. These keywords had stronger relationship to a specific abstract field such as research design than the topic of an article.

An Optimized e-Lecture Video Search and Indexing framework

  • Medida, Lakshmi Haritha;Ramani, Kasarapu
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.87-96
    • /
    • 2021
  • The demand for e-learning through video lectures is rapidly increasing due to its diverse advantages over the traditional learning methods. This led to massive volumes of web-based lecture videos. Indexing and retrieval of a lecture video or a lecture video topic has thus proved to be an exceptionally challenging problem. Many techniques listed by literature were either visual or audio based, but not both. Since the effects of both the visual and audio components are equally important for the content-based indexing and retrieval, the current work is focused on both these components. A framework for automatic topic-based indexing and search depending on the innate content of the lecture videos is presented. The text from the slides is extracted using the proposed Merged Bounding Box (MBB) text detector. The audio component text extraction is done using Google Speech Recognition (GSR) technology. This hybrid approach generates the indexing keywords from the merged transcripts of both the video and audio component extractors. The search within the indexed documents is optimized based on the Naïve Bayes (NB) Classification and K-Means Clustering models. This optimized search retrieves results by searching only the relevant document cluster in the predefined categories and not the whole lecture video corpus. The work is carried out on the dataset generated by assigning categories to the lecture video transcripts gathered from e-learning portals. The performance of search is assessed based on the accuracy and time taken. Further the improved accuracy of the proposed indexing technique is compared with the accepted chain indexing technique.

Text Mining in Online Social Networks: A Systematic Review

  • Alhazmi, Huda N
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.3
    • /
    • pp.396-404
    • /
    • 2022
  • Online social networks contain a large amount of data that can be converted into valuable and insightful information. Text mining approaches allow exploring large-scale data efficiently. Therefore, this study reviews the recent literature on text mining in online social networks in a way that produces valid and valuable knowledge for further research. The review identifies text mining techniques used in social networking, the data used, tools, and the challenges. Research questions were formulated, then search strategy and selection criteria were defined, followed by the analysis of each paper to extract the data relevant to the research questions. The result shows that the most social media platforms used as a source of the data are Twitter and Facebook. The most common text mining technique were sentiment analysis and topic modeling. Classification and clustering were the most common approaches applied by the studies. The challenges include the need for processing with huge volumes of data, the noise, and the dynamic of the data. The study explores the recent development in text mining approaches in social networking by providing state and general view of work done in this research area.

Twitter Sentiment Analysis for the Recent Trend Extracted from the Newspaper Article (신문기사로부터 추출한 최근동향에 대한 트위터 감성분석)

  • Lee, Gyoung Ho;Lee, Kong Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.10
    • /
    • pp.731-738
    • /
    • 2013
  • We analyze public opinion via a sentiment analysis of tweets collected by using recent topic keywords extracted from newspaper articles. Newspaper articles collected within a certain period of time are clustered by using K-means algorithm and topic keywords for each cluster are extracted by using term frequency. A sentiment analyzer learned by a machine learning method can classify tweets according to their polarity values. We have an assumption that tweets collected by using these topic keywords deal with the same topics as the newspaper articles mentioned if the tweets and the newspapers are generated around the same time. and we tried to verify the validity of this assumption.

A Study on Research Trend for Nurses' Workplace Bullying in Korea: Focusing on Semantic Network Analysis and Topic Modeling (간호사의 직장 내 괴롭힘에 대한 국내 연구 동향 분석: 의미연결망분석과 토픽모델링 중심)

  • Choi, Jeong Sil;Kim, Youngji
    • Korean Journal of Occupational Health Nursing
    • /
    • v.28 no.4
    • /
    • pp.221-229
    • /
    • 2019
  • Purpose: The aim of this study was to identify core keywords and topic groups of workplace bullying researches in the past 10 years for better understanding research trend. Methods: The study was conducted in four steps: 1) collecting abstracts, 2) extracting and cleaning semantic morphemes, 3) building co-occurrence matrix and 4) analyzing network features and clustering topic groups. Results: 437 articles between 2010 and 2019 were retrieved from 5 databases (RISS, NDSL, Google scholar, DBPIA and Kyobo Scholar). Forty-one abstracts from these articles were extracted, and network analysis was conducted using semantic network module. The most important core keywords were 'turnover', 'intention', 'factor', 'program' and 'nursing'. Four topic groups were identified from Korean databases. Major topics were 'turnover' and 'organization culture'. Conclusion: After reviewing previous research, it has been found that turnover intention has been emphasized. Further research focused on various intervention is needed to relieve workplace bullying in nursing field.

Review of Wind Energy Publications in Korea Citation Index using Latent Dirichlet Allocation (잠재디리클레할당을 이용한 한국학술지인용색인의 풍력에너지 문헌검토)

  • Kim, Hyun-Goo;Lee, Jehyun;Oh, Myeongchan
    • New & Renewable Energy
    • /
    • v.16 no.4
    • /
    • pp.33-40
    • /
    • 2020
  • The research topics of more than 1,900 wind energy papers registered in the Korean Journal Citation Index (KCI) were modeled into 25 topics using latent directory allocation (LDA), and their consistency was cross-validated through principal component analysis (PCA) of the document word matrix. Key research topics in the wind energy field were identified as "offshore, wind farm," "blade, design," "generator, voltage, control," 'dynamic, load, noise," and "performance test." As a new method to determine the similarity between research topics in journals, a systematic evaluation method was proposed to analyze the correlation between topics by constructing a journal-topic matrix (JTM) and clustering them based on topic similarity between journals. By evaluating 24 journals that published more than 20 wind energy papers, it was confirmed that they were classified into meaningful clusters of mechanical engineering, electrical engineering, marine engineering, and renewable energy. It is expected that the proposed systematic method can be applied to the evaluation of the specificity of subsequent journals.

A Convergence Study on the Topic and Sentiment of COVID19 Research in Korea Using Text Analysis (텍스트 분석을 이용한 코로나19 관련 국내 논문의 주제 및 감성에 관한 융합 연구)

  • Heo, Seong-Min;Yang, Ji-Yeon
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.4
    • /
    • pp.31-42
    • /
    • 2021
  • The purpose of this study was to explore research topics and examine the trend in COVID19 related research papers. We identified eight topics using latent Dirichlet allocation and found acceptable validity in comparison with the structural topic model. The subtopics have been extracted using k-means clustering and plotted in PCA space. Additionally, we discovered the topics bearing negative tones and warning signs by sentiment analysis. The results flagged up the issues of the topics, Biomedical Related, International Dynamics and Psychological Impact. The findings could serve as a guideline for researchers who explore new research directions and policymakers who need to make decisions about which research projects to support.

NOGSEC: A NOnparametric method for Genome SEquence Clustering (녹섹(NOGSEC): A NOnparametric method for Genome SEquence Clustering)

  • 이영복;김판규;조환규
    • Korean Journal of Microbiology
    • /
    • v.39 no.2
    • /
    • pp.67-75
    • /
    • 2003
  • One large topic in comparative genomics is to predict functional annotation by classifying protein sequences. Computational approaches for function prediction include protein structure prediction, sequence alignment and domain prediction or binding site prediction. This paper is on another computational approach searching for sets of homologous sequences from sequence similarity graph. Methods based on similarity graph do not need previous knowledges about sequences, but largely depend on the researcher's subjective threshold settings. In this paper, we propose a genome sequence clustering method of iterative testing and graph decomposition, and a simple method to calculate a strict threshold having biochemical meaning. Proposed method was applied to known bacterial genome sequences and the result was shown with the BAG algorithm's. Result clusters are lacking some completeness, but the confidence level is very high and the method does not need user-defined thresholds.

An efficient Video Dehazing Algorithm Based on Spectral Clustering

  • Zhao, Fan;Yao, Zao;Song, Xiaofang;Yao, Yi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.7
    • /
    • pp.3239-3267
    • /
    • 2018
  • Image and video dehazing is a popular topic in the field of computer vision and digital image processing. A fast, optimized dehazing algorithm was recently proposed that enhances contrast and reduces flickering artifacts in a dehazed video sequence by minimizing a cost function that makes transmission values spatially and temporally coherent. However, its fixed-size block partitioning leads to block effects. The temporal cost function also suffers from the temporal non-coherence of newly appearing objects in a scene. Further, the weak edges in a hazy image are not addressed. Hence, a video dehazing algorithm based on well designed spectral clustering is proposed. To avoid block artifacts, the spectral clustering is customized to segment static scenes to ensure the same target has the same transmission value. Assuming that edge images dehazed with optimized transmission values have richer detail than before restoration, an edge intensity function is added to the spatial consistency cost model. Atmospheric light is estimated using a modified quadtree search. Different temporal transmission models are established for newly appearing objects, static backgrounds, and moving objects. The experimental results demonstrate that the new method provides higher dehazing quality and lower time complexity than the previous technique.