• Title/Summary/Keyword: Term Clustering

Search Result 177, Processing Time 0.025 seconds

Cluster analysis by month for meteorological stations using a gridded data of numerical model with temperatures and precipitation (기온과 강수량의 수치모델 격자자료를 이용한 기상관측지점의 월별 군집화)

  • Kim, Hee-Kyung;Kim, Kwang-Sub;Lee, Jae-Won;Lee, Yung-Seop
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.5
    • /
    • pp.1133-1144
    • /
    • 2017
  • Cluster analysis with meteorological data allows to segment meteorological region based on meteorological characteristics. By the way, meteorological observed data are not adequate for cluster analysis because meteorological stations which observe the data are located not uniformly. Therefore the clustering of meteorological observed data cannot reflect the climate characteristic of South Korea properly. The clustering of $5km{\times}5km$ gridded data derived from a numerical model, on the other hand, reflect it evenly. In this study, we analyzed long-term grid data for temperatures and precipitation using cluster analysis. Due to the monthly difference of climate characteristics, clustering was performed by month. As the result of K-Means cluster analysis is so sensitive to initial values, we used initial values with Ward method which is hierarchical cluster analysis method. Based on clustering of gridded data, cluster of meteorological stations were determined. As a result, clustering of meteorological stations in South Korea has been made spatio-temporal segmentation.

Feature Filtering Methods for Web Documents Clustering (웹 문서 클러스터링에서의 자질 필터링 방법)

  • Park Heum;Kwon Hyuk-Chul
    • The KIPS Transactions:PartB
    • /
    • v.13B no.4 s.107
    • /
    • pp.489-498
    • /
    • 2006
  • Clustering results differ according to the datasets and the performance worsens even while using web documents which are manually processed by an indexer, because although representative clusters for a feature can be obtained by statistical feature selection methods, irrelevant features(i.e., non-obvious features and those appearing in general documents) are not eliminated. Those irrelevant features should be eliminated for improving clustering performance. Therefore, this paper proposes three feature-filtering algorithms which consider feature values per document set, together with distribution, frequency, and weights of features per document set: (l) features filtering algorithm in a document (FFID), (2) features filtering algorithm in a document matrix (FFIM), and (3) a hybrid method combining both FFID and FFIM (HFF). We have tested the clustering performance by feature selection using term frequency and expand co link information, and by feature filtering using the above methods FFID, FFIM, HFF methods. According to the results of our experiments, HFF had the best performance, whereas FFIM performed better than FFID.

Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds (Stiefel 다양체에서 곱셈의 업데이트를 이용한 비음수 행렬의 직교 분해)

  • Yoo, Ji-Ho;Choi, Seung-Jin
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.5
    • /
    • pp.347-352
    • /
    • 2009
  • Nonnegative matrix factorization (NMF) is a popular method for multivariate analysis of nonnegative data, the goal of which is decompose a data matrix into a product of two factor matrices with all entries in factor matrices restricted to be nonnegative. NMF was shown to be useful in a task of clustering (especially document clustering). In this paper we present an algorithm for orthogonal nonnegative matrix factorization, where an orthogonality constraint is imposed on the nonnegative decomposition of a term-document matrix. We develop multiplicative updates directly from true gradient on Stiefel manifold, whereas existing algorithms consider additive orthogonality constraints. Experiments on several different document data sets show our orthogonal NMF algorithms perform better in a task of clustering, compared to the standard NMF and an existing orthogonal NMF.

An expanded Matrix Factorization model for real-time Web service QoS prediction

  • Hao, Jinsheng;Su, Guoping;Han, Xiaofeng;Nie, Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.3913-3934
    • /
    • 2021
  • Real-time prediction of Web service of quality (QoS) provides more convenience for web services in cloud environment, but real-time QoS prediction faces severe challenges, especially under the cold-start situation. Existing literatures of real-time QoS predicting ignore that the QoS of a user/service is related to the QoS of other users/services. For example, users/services belonging to the same group of category will have similar QoS values. All of the methods ignore the group relationship because of the complexity of the model. Based on this, we propose a real-time Matrix Factorization based Clustering model (MFC), which uses category information as a new regularization term of the loss function. Specifically, in order to meet the real-time characteristic of the real-time prediction model, and to minimize the complexity of the model, we first map the QoS values of a large number of users/services to a lower-dimensional space by the PCA method, and then use the K-means algorithm calculates user/service category information, and use the average result to obtain a stable final clustering result. Extensive experiments on real-word datasets demonstrate that MFC outperforms other state-of-the-art prediction algorithms.

Analysis of News Articles on Child Welfare Policies in South Korea: K-Means Clustering (대한민국 정권별 아동복지정책 관련 뉴스 기사 분석: K-평균 군집 분석)

  • Kim, Eun Joo;Kim, Seong Kwang;Park, Bit Na
    • Journal of East-West Nursing Research
    • /
    • v.29 no.2
    • /
    • pp.185-195
    • /
    • 2023
  • Purpose: The purpose of this study is to analyze changes of child welfare policies and provide insights based on the collection and classification of newspaper articles. Methods: Articles related to child welfare policies were collected from 1990, during the Kim, Young-sam administration, to May 9, 2022, under the Moon, Jae-in administration. K-Means clustering and keyword Term Frequency-Inverse Document Frequency analysis were utilized to cluster and analyze newspaper articles with similar themes. Results: The administrations of Kim, Young-sam, Kim, Dae-jung, Roh, Moo-hyun, and Park, Geun-hye were classified into two clusters, and the Lee, Myung-bak and Moon, Jae-in administrations were classified into three clusters. Conclusion: South Korea's child welfare policies have focused on ensuring the safety and healthy development of children through diverse policies initiatives over the years. However, challenges related to child protection and child abuse persist. This requires additional resources and budget allocation. It is important to establish a comprehensive support system for children and families, including comprehensive nursing support.

A Study on Intellectual Structure of Library and Information Science in Korea (문헌정보학의 지식 구조에 관한 연구)

  • Yoo, Yeong-Jun
    • Journal of the Korean Society for information Management
    • /
    • v.20 no.3
    • /
    • pp.277-297
    • /
    • 2003
  • This study was conducted upon the premise that index terms display the intellectual structure of a specific subject field. In this study, and attempt was made to grasp the intellectual structure of Library and Information. Science by clustering the index terms of the journals of the related academic societies at the Library of National Assembly - such as the Journal of the Korean Society for Information Management, the Journal of the Korean Library and Information Science Society, and the Journal of the Korean Society for Library and Information Science. Through the course of the study, index term clusters were generated based on the linkage of the index terms and the frequency of co-occurrence, and moreover, time periods analysis was conducted along with studies on first-appearing terms, in order to clarify the trend and development process of the Library and Information Science. This study also analysed the difference between two intellectual structure by comparing the structure generated by index term clusters with the existing structure of traditional classification systems.

Study on mapping of dark matter clustering from real space to redshift space

  • Zheng, Yi;Song, Yong-Seon
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.41 no.1
    • /
    • pp.38.2-38.2
    • /
    • 2016
  • The mapping of dark matter clustering from real to redshift spaces introduces the anisotropic property to the measured density power spectrum in redshift space, known as the Redshift Space Distortion (hereafter RSD) effect. The mapping formula is intrinsically non-linear, which is complicated by the higher order polynomials due to the indefinite cross correlations between the density and velocity fields, and the Finger-of-God (hereafter FoG) effect due to the randomness of the peculiar velocity field. Furthermore, the rigorous test of this mapping formula is contaminated by the unknown non-linearity of the density and velocity fields, including their auto- and cross-correlations, for calculating which our theoretical calculation breaks down beyond some scales. Whilst the full higher order polynomials remains unknown, the other systematics can be controlled consistently within the same order truncation in the expansion of the mapping formula, as shown in this paper. The systematic due to the unknown non-linear density and velocity fields is removed by separately measuring all terms in the expansion using simulations. The uncertainty caused by the velocity randomness is controlled by splitting the FoG term into two pieces, 1) the non-local FoG term being independent of the separation vector between two different points, and 2) the local FoG term appearing as an indefinite polynomials which is expanded in the same order as all other perturbative polynomials. Using 100 realizations of simulations, we find that the best fitted non-local FoG function is Gaussian, with only one scale-independent free parameter, and that our new mapping formulation accurately reproduces the observed power spectrum in redshift space at the smallest scales by far, up to k ~ 0.3 h/Mpc, considering the resolution of future experiments.

  • PDF

A Study on the Classification of Institutional Long-term Care Based Upon Characteristics of Institutionalized Elderlies (노인복지시설 수용자 특성별 장기 요양서비스 유형설정에 관한 연구)

  • 김영숙;문옥륜
    • Health Policy and Management
    • /
    • v.4 no.2
    • /
    • pp.27-57
    • /
    • 1994
  • The objective of running a long-term care institution is to provide services helpful for maintaining, supporting, and improving elderlies' optimum level of physical, mental, and psychosocial functioning. For the purpose of analyzing the current situations of institutional long term care facilities in Korea, 27 facilities were selected proportionately from each of the cities and provinces, out of the total 152 facilities. About 20% of those who were institutionalized during 25 August through 2 Qctober 1993, the 391 elderlies were chosen on a systematic random basis. The instrument of this study was developed by modifying the tools of CARE, MAI and PCTC. A multivariate approach of discriminant analysis and clustering technique were employed for this study. The Stiudy reveals that there is no clear differentiation of goals and functions among the longterm care institutions in Korea. Staffing patte군 of long-term care facilities shows a shortage of nurses, physical therapists, and dieticians. The linkage between acute care facilities and long-term care is weak, and administration of long-term care faciltiy is carried out by non-professionals. They are responsible for assessing health status before entering the facility, and evaluating elderlies' care. Therefore, it is not surprising to find that most of the facilities have accommodated agede regardless of their real needs and health status. Based upon findings of the analysis, this study has classified long-term care facilities into four types : Type I is to help elderlies maintain independence in daily living activities. Type II facilities have the objective of maintaining and improving the current level of elderlies' function. Type III is to maintain maximum independence of elderlies in activities of daily living. And Type IV is identified for the group of facilities designed to restore or improve functional abilities of elderlies. In conclusion, the following suggestions are made : the need for long-term care should be assessed by multidimensional measurement. Institutional long-term care facilities should be classified and developed in response to type of type of care and service need. Both acute and long-term care facilities should be linked together in order to support the evaluation of service operation and program development.

  • PDF

Web Document Clustering for Specific Subject Information Using WordNet and HTML Tags (WordNet과 HTML 태그를 활용한 특정영역 정보의 웹 문서 분류)

  • 조은휘;변영태
    • Proceedings of the Korean Society for Cognitive Science Conference
    • /
    • 2002.05a
    • /
    • pp.28-32
    • /
    • 2002
  • 웹 상의 많은 정보들 속에서 사용자가 원하는 정보를 찾아내는 일은 쉽지 않다. 사용자가 의도하는 양질의 정보 제공을 위해 특정 영역과 관련한 정보 제공 시스템이 .개발되고 있다. 이전 시스템은 특정 영역 관련 지식베이스를 토대로 하여 웹 문서를 수집해 놓고, 사용자에게 정보를 제공한다. 본 논문에서는 전문 사이트 내에 문서간의 유사성을 토대로 하여 동물 영역에 대한 효과적인 문서 클러스타링(clustering)에 관해 실험하였다. 기존의 방법에서는 문서의 분류나 질의어와 관련한 문서 선택이나 순위 결정이 주로 텀(term)을 바탕으로 하고 있다. 본 논문에서는 각 문서 내의 텀 뿐만 아니라 HTML 태그(tag), 지식베이스에 WordNet의 계층구조를 적용한 data를 활용하고, SVD(Singular Value Decomposition)를 사용하여 문서간의 관계를 밝혀내어 문서 분류 및 수집에 이용하였다. 특정 영역의 전문 문서를 많이 제공하는 사이트에 적용하여 좋은 결과를 볼 수 있었다.

  • PDF

Detection of onset of failure in prestressed strands by cluster analysis of acoustic emissions

  • Ercolino, Marianna;Farhidzadeh, Alireza;Salamone, Salvatore;Magliulo, Gennaro
    • Structural Monitoring and Maintenance
    • /
    • v.2 no.4
    • /
    • pp.339-355
    • /
    • 2015
  • Corrosion of prestressed concrete structures is one of the main challenges that engineers face today. In response to this national need, this paper presents the results of a long-term project that aims at developing a structural health monitoring (SHM) technology for the nondestructive evaluation of prestressed structures. In this paper, the use of permanently installed low profile piezoelectric transducers (PZT) is proposed in order to record the acoustic emissions (AE) along the length of the strand. The results of an accelerated corrosion test are presented and k-means clustering is applied via principal component analysis (PCA) of AE features to provide an accurate diagnosis of the strand health. The proposed approach shows good correlation between acoustic emissions features and strand failure. Moreover, a clustering technique for the identification of false alarms is proposed.