• 제목/요약/키워드: Topic Clustering

검색결과 100건 처리시간 0.026초

임신성 당뇨와 모유수유에 대한 연구 동향 분석: 텍스트네트워크 분석과 토픽모델링 중심 (A study on research trends for gestational diabetes mellitus and breastfeeding: Focusing on text network analysis and topic modeling)

  • 이정림;김영지;곽은주;박승미
    • 한국간호교육학회지
    • /
    • 제27권2호
    • /
    • pp.175-185
    • /
    • 2021
  • Purpose: The aim of this study was to identify core keywords and topic groups in the 'Gestational diabetes mellitus (GDM) and Breastfeeding' field of research for better understanding research trends in the past 20 years. Methods: This was a text-mining and topic modeling study composed of four steps: 1) collecting abstracts, 2) extracting and cleaning semantic morphemes, 3) building a co-occurrence matrix, and 4) analyzing network features and clustering topic groups. Results: A total of 635 papers published between 2001 and 2020 were found in databases (Web of Science, CINAHL, RISS, DBPIA, RISS, KISS). Among them, 3,639 words extracted from 366 articles selected according to the conditions were analyzed by text network analysis and topic modeling. The most important keywords were 'exposure', 'fetus', 'hypoglycemia', 'prevention' and 'program'. Six topic groups were identified through topic modeling. The main topics of the study were 'cardiovascular disease' and 'obesity'. Through the topic modeling analysis, six themes were derived: 'cardiovascular disease', 'obesity', 'complication prevention strategy', 'support of breastfeeding', 'educational program' and 'management of GDM'. Conclusion: This study showed that over the past 20 years many studies have been conducted on complications such as cardiovascular diseases and obesity related to gestational diabetes and breastfeeding. In order to prevent complications of gestational diabetes and promote breastfeeding, various nursing interventions, including gestational diabetes management and educational programs for GDM pregnancies, should be developed in nursing fields.

문서 클러스터링을 위한 학술지 논문의 구조적 초록 활용성 연구 (Usability Analysis of Structured Abstracts in Journal Articles for Document Clustering)

  • 최상희;이재윤
    • 정보관리학회지
    • /
    • 제29권1호
    • /
    • pp.331-349
    • /
    • 2012
  • 구조적 초록은 학술 논문의 주제를 표현하는 역할을 하여 학술 논문을 처리하는데 중요한 요소로 인식되어왔다. 이 연구에서는 구조적 초록을 구성하는 세부 필드의 속성을 4개로 분석하고 초록의 구조를 활용하여 문서 클러스터링에 적용할 수 있는 가능성을 고찰고자 하였다. 구조적 초록의 필드 속성을 문서 클러스터링에 적용한 결과 클러스터링 기법간의 편차가 있었으나 연구 목적이 제공하는 정보량에 비해 주제성이 커서 클러스터링 성능에 가장 큰 영향을 미치고 있는 것으로 나타났다. 또한 분석 결과 특정 필드에 특화되어 출현하는 필드 종속적인 단어가 발생하는 것으로 나타나 필드 종속적인 단어를 배제하고 집단내 평균연결 기법을 적용하였을 때는 클러스터링의 성능이 개선되는 것으로 분석되었다.

An Optimized e-Lecture Video Search and Indexing framework

  • Medida, Lakshmi Haritha;Ramani, Kasarapu
    • International Journal of Computer Science & Network Security
    • /
    • 제21권8호
    • /
    • pp.87-96
    • /
    • 2021
  • The demand for e-learning through video lectures is rapidly increasing due to its diverse advantages over the traditional learning methods. This led to massive volumes of web-based lecture videos. Indexing and retrieval of a lecture video or a lecture video topic has thus proved to be an exceptionally challenging problem. Many techniques listed by literature were either visual or audio based, but not both. Since the effects of both the visual and audio components are equally important for the content-based indexing and retrieval, the current work is focused on both these components. A framework for automatic topic-based indexing and search depending on the innate content of the lecture videos is presented. The text from the slides is extracted using the proposed Merged Bounding Box (MBB) text detector. The audio component text extraction is done using Google Speech Recognition (GSR) technology. This hybrid approach generates the indexing keywords from the merged transcripts of both the video and audio component extractors. The search within the indexed documents is optimized based on the Naïve Bayes (NB) Classification and K-Means Clustering models. This optimized search retrieves results by searching only the relevant document cluster in the predefined categories and not the whole lecture video corpus. The work is carried out on the dataset generated by assigning categories to the lecture video transcripts gathered from e-learning portals. The performance of search is assessed based on the accuracy and time taken. Further the improved accuracy of the proposed indexing technique is compared with the accepted chain indexing technique.

Text Mining in Online Social Networks: A Systematic Review

  • Alhazmi, Huda N
    • International Journal of Computer Science & Network Security
    • /
    • 제22권3호
    • /
    • pp.396-404
    • /
    • 2022
  • Online social networks contain a large amount of data that can be converted into valuable and insightful information. Text mining approaches allow exploring large-scale data efficiently. Therefore, this study reviews the recent literature on text mining in online social networks in a way that produces valid and valuable knowledge for further research. The review identifies text mining techniques used in social networking, the data used, tools, and the challenges. Research questions were formulated, then search strategy and selection criteria were defined, followed by the analysis of each paper to extract the data relevant to the research questions. The result shows that the most social media platforms used as a source of the data are Twitter and Facebook. The most common text mining technique were sentiment analysis and topic modeling. Classification and clustering were the most common approaches applied by the studies. The challenges include the need for processing with huge volumes of data, the noise, and the dynamic of the data. The study explores the recent development in text mining approaches in social networking by providing state and general view of work done in this research area.

신문기사로부터 추출한 최근동향에 대한 트위터 감성분석 (Twitter Sentiment Analysis for the Recent Trend Extracted from the Newspaper Article)

  • 이경호;이공주
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제2권10호
    • /
    • pp.731-738
    • /
    • 2013
  • 본 논문은 사회의 최근 동향에 대한 여론의 반응을 관찰하기 위한 방법을 나타낸다. 최근 동향을 나타내는 키워드를 신문기사로부터 추출하고, 추출된 키워드를 이용하여 수집된 트윗의 감성 분석을 통해 최근 동향에 대한 여론을 분석한다. 수집된 신문기사를 k-means알고리즘을 이용하여 군집화하고, 군집내의 단어의 출현 빈도를 이용하여 토픽 키워드를 선정하였다. 각 토픽에 대하여 수집된 트윗은 그 토픽 대한 트윗이라는 가정하에 기계학습 방법을 이용하여 긍/부정을 판별하여 감성을 판단하게 하였다. 그리고 이와 같은 가정에 대한 타당성을 검증해 보았다.

간호사의 직장 내 괴롭힘에 대한 국내 연구 동향 분석: 의미연결망분석과 토픽모델링 중심 (A Study on Research Trend for Nurses' Workplace Bullying in Korea: Focusing on Semantic Network Analysis and Topic Modeling)

  • 최정실;김영지
    • 한국직업건강간호학회지
    • /
    • 제28권4호
    • /
    • pp.221-229
    • /
    • 2019
  • Purpose: The aim of this study was to identify core keywords and topic groups of workplace bullying researches in the past 10 years for better understanding research trend. Methods: The study was conducted in four steps: 1) collecting abstracts, 2) extracting and cleaning semantic morphemes, 3) building co-occurrence matrix and 4) analyzing network features and clustering topic groups. Results: 437 articles between 2010 and 2019 were retrieved from 5 databases (RISS, NDSL, Google scholar, DBPIA and Kyobo Scholar). Forty-one abstracts from these articles were extracted, and network analysis was conducted using semantic network module. The most important core keywords were 'turnover', 'intention', 'factor', 'program' and 'nursing'. Four topic groups were identified from Korean databases. Major topics were 'turnover' and 'organization culture'. Conclusion: After reviewing previous research, it has been found that turnover intention has been emphasized. Further research focused on various intervention is needed to relieve workplace bullying in nursing field.

잠재디리클레할당을 이용한 한국학술지인용색인의 풍력에너지 문헌검토 (Review of Wind Energy Publications in Korea Citation Index using Latent Dirichlet Allocation)

  • 김현구;이제현;오명찬
    • 신재생에너지
    • /
    • 제16권4호
    • /
    • pp.33-40
    • /
    • 2020
  • The research topics of more than 1,900 wind energy papers registered in the Korean Journal Citation Index (KCI) were modeled into 25 topics using latent directory allocation (LDA), and their consistency was cross-validated through principal component analysis (PCA) of the document word matrix. Key research topics in the wind energy field were identified as "offshore, wind farm," "blade, design," "generator, voltage, control," 'dynamic, load, noise," and "performance test." As a new method to determine the similarity between research topics in journals, a systematic evaluation method was proposed to analyze the correlation between topics by constructing a journal-topic matrix (JTM) and clustering them based on topic similarity between journals. By evaluating 24 journals that published more than 20 wind energy papers, it was confirmed that they were classified into meaningful clusters of mechanical engineering, electrical engineering, marine engineering, and renewable energy. It is expected that the proposed systematic method can be applied to the evaluation of the specificity of subsequent journals.

텍스트 분석을 이용한 코로나19 관련 국내 논문의 주제 및 감성에 관한 융합 연구 (A Convergence Study on the Topic and Sentiment of COVID19 Research in Korea Using Text Analysis)

  • 허성민;양지연
    • 한국융합학회논문지
    • /
    • 제12권4호
    • /
    • pp.31-42
    • /
    • 2021
  • 본 연구에서는 코로나19 관련 연구논문의 연구주제를 탐색하고 동향을 검토하고 있다. 또한 감성분석을 통해 부정적인 어조가 강한 경고가 되는 주제들을 알아본다. 잠재 디리슐레 할당(LDA)를 이용하여 총 8개의 토픽을 발견하였고, 이를 구조적 토픽 모델링(STM)과 비교하여 비교적 안정적인 결과임을 확인하였다. 또한 k-means 군집 알고리즘을 통해 각 토픽별로 세부 연구주제를 발견하였고 주성분 분석을 이용하여 이를 시각적으로 표현하였다. 감성분석을 통해 각 토픽별 긍정적, 부정적인 단어들을 살펴보고 감성점수를 계산하여 연구논문의 주된 어조를 파악하였는데, 특히 생물 의학 관련, 국제적 역학관계, 심리적 영향과 관련된 연구에서 부정적인 어조가 강한 것으로 나타나 해당 부문에 대해서 주의와 관심이 요구된다. 향후 연구자들이 연구의 방향성을 탐색하고 정책결정자들이 연구지원 사업을 결정하는데 기초자료로 활용될 수 있을 것이다.

녹섹(NOGSEC): A NOnparametric method for Genome SEquence Clustering (NOGSEC: A NOnparametric method for Genome SEquence Clustering)

  • 이영복;김판규;조환규
    • 미생물학회지
    • /
    • 제39권2호
    • /
    • pp.67-75
    • /
    • 2003
  • 비교유전체학의 주요 주제 중 유전자서열을 분류하고 단백질기능을 예측하는 연구가 있으며, 이를 위해 단백질 구조, 공통서열 및 바인딩 위치 예측등의 방법과 함께, 전유전체 서열에서 구해지는 유사도 그래프를 분석해 상동유전자를 검색하는 계산학적인 접근방법이 있다. 유사도그래프를 사용한 방법은 서열에 대한 기존 지식에 의존하지 않는 장점이 있지만 유사도 하한값과 같은 주관적인 임계값이 필요한 단점이 있다. 본 논문에서는 반복적으로 그래프를 분해하는 이전의 방법을 일반화시켜, 유사도 그래프에 기반한 유전자 서열군집분석 방법론과 객관적이고 안정적인 파라미터 임계값 계산 방법을 제안한다. 제시된 방법으로 알려진 미생물 유전체 서 열을 분석하여 이전의 방법인 BAG 알고리즘 결과와 비교했다.

An efficient Video Dehazing Algorithm Based on Spectral Clustering

  • Zhao, Fan;Yao, Zao;Song, Xiaofang;Yao, Yi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권7호
    • /
    • pp.3239-3267
    • /
    • 2018
  • Image and video dehazing is a popular topic in the field of computer vision and digital image processing. A fast, optimized dehazing algorithm was recently proposed that enhances contrast and reduces flickering artifacts in a dehazed video sequence by minimizing a cost function that makes transmission values spatially and temporally coherent. However, its fixed-size block partitioning leads to block effects. The temporal cost function also suffers from the temporal non-coherence of newly appearing objects in a scene. Further, the weak edges in a hazy image are not addressed. Hence, a video dehazing algorithm based on well designed spectral clustering is proposed. To avoid block artifacts, the spectral clustering is customized to segment static scenes to ensure the same target has the same transmission value. Assuming that edge images dehazed with optimized transmission values have richer detail than before restoration, an edge intensity function is added to the spatial consistency cost model. Atmospheric light is estimated using a modified quadtree search. Different temporal transmission models are established for newly appearing objects, static backgrounds, and moving objects. The experimental results demonstrate that the new method provides higher dehazing quality and lower time complexity than the previous technique.