• Title/Summary/Keyword: Non-clustering

Search Result 398, Processing Time 0.034 seconds

Enhancing Document Clustering using Important Term of Cluster and Wikipedia (군집의 중요 용어와 위키피디아를 이용한 문서군집 향상)

  • Park, Sun;Lee, Yeon-Woo;Jeong, Min-A;Lee, Seong-Ro
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.49 no.2
    • /
    • pp.45-52
    • /
    • 2012
  • This paper proposes a new enhancing document clustering method using the important terms of cluster and the wikipedia. The proposed method can well represent the concept of cluster topics by means of selecting the important terms in cluster by the semantic features of NMF. It can solve the problem of "bags of words" to be not considered the meaningful relationships between documents and clusters, which expands the important terms of cluster by using of the synonyms of wikipedia. Also, it can improve the quality of document clustering which uses the expanded cluster important terms to refine the initial cluster by re-clustering. The experimental results demonstrate that the proposed method achieves better performance than other document clustering methods.

Accessing the Clustering of TNM Stages on Survival Analysis of Lung Cancer Patient (폐암환자 생존분석에 대한 TNM 병기 군집분석 평가)

  • Choi, Chulwoong;Kim, Kyungbaek
    • Smart Media Journal
    • /
    • v.9 no.4
    • /
    • pp.126-133
    • /
    • 2020
  • The treatment policy and prognosis are determined based on the final stage of lung cancer patients. The final stage of lung cancer patients is determined based on the T, N, and M stage classification table provided by the American Cancer Society (AJCC). However, the final stage of AJCC has limitations in its use for various fields such as patient treatment, prognosis and survival days prediction. In this paper, clustering algorithm which is one of non-supervised learning algorithms was assessed in order to check whether using only T, N, M stages with a data science method is effective for classifying the group of patients in the aspect of survival days. The final stage groups and T, N, M stage clustering groups of lung cancer patients were compared by using the cox proportional hazard model. It is confirmed that the accuracy of prediction of survival days with only T, N, M stages becomes higher than the accuracy with the final stages of patients. Especially, the accuracy of prediction of survival days with clustering of T, N, M stages improves when more or less clusters are analyzed than the seven clusters which is same to the number of final stage of AJCC.

Development of an unsupervised learning-based ESG evaluation process for Korean public institutions without label annotation

  • Do Hyeok Yoo;SuJin Bak
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.5
    • /
    • pp.155-164
    • /
    • 2024
  • This study proposes an unsupervised learning-based clustering model to estimate the ESG ratings of domestic public institutions. To achieve this, the optimal number of clusters was determined by comparing spectral clustering and k-means clustering. These results are guaranteed by calculating the Davies-Bouldin Index (DBI), a model performance index. The DBI values were 0.734 for spectral clustering and 1.715 for k-means clustering, indicating lower values showed better performance. Thus, the superiority of spectral clustering was confirmed. Furthermore, T-test and ANOVA were used to reveal statistically significant differences between ESG non-financial data, and correlation coefficients were used to confirm the relationships between ESG indicators. Based on these results, this study suggests the possibility of estimating the ESG performance ranking of each public institution without existing ESG ratings. This is achieved by calculating the optimal number of clusters, and then determining the sum of averages of the ESG data within each cluster. Therefore, the proposed model can be employed to evaluate the ESG ratings of various domestic public institutions, and it is expected to be useful in domestic sustainable management practice and performance management.

Optimal Parameter Analysis and Evaluation of Change Detection for SLIC-based Superpixel Techniques Using KOMPSAT Data (KOMPSAT 영상을 활용한 SLIC 계열 Superpixel 기법의 최적 파라미터 분석 및 변화 탐지 성능 비교)

  • Chung, Minkyung;Han, Youkyung;Choi, Jaewan;Kim, Yongil
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.6_3
    • /
    • pp.1427-1443
    • /
    • 2018
  • Object-based image analysis (OBIA) allows higher computation efficiency and usability of information inherent in the image, as it reduces the complexity of the image while maintaining the image properties. Superpixel methods oversegment the image with a smaller image unit than an ordinary object segment and well preserve the edges of the image. SLIC (Simple linear iterative clustering) is known for outperforming the previous superpixel methods with high image segmentation quality. Although the input parameter for SLIC, number of superpixels has considerable influence on image segmentation results, impact analysis for SLIC parameter has not been investigated enough. In this study, we performed optimal parameter analysis and evaluation of change detection for SLIC-based superpixel techniques using KOMPSAT data. Forsuperpixel generation, three superpixel methods (SLIC; SLIC0, zero parameter version of SLIC; SNIC, simple non-iterative clustering) were used with superpixel sizes in ranges of $5{\times}5$ (pixels) to $50{\times}50$ (pixels). Then, the image segmentation results were analyzed for how well they preserve the edges of the change detection reference data. Based on the optimal parameter analysis, image segmentation boundaries were obtained from difference image of the bi-temporal images. Then, DBSCAN (Density-based spatial clustering of applications with noise) was applied to cluster the superpixels to a certain size of objects for change detection. The changes of features were detected for each superpixel and compared with reference data for evaluation. From the change detection results, it proved that better change detection can be achieved even with bigger superpixel size if the superpixels were generated with high regularity of size and shape.

A Mesh Partitioning Using Adaptive Vertex Clustering (적응형 정점 군집화를 이용한 메쉬 분할)

  • Kim, Dae-Young;Kim, Jong-Won;Lee, Hae-Young
    • Journal of the Korea Computer Graphics Society
    • /
    • v.15 no.3
    • /
    • pp.19-26
    • /
    • 2009
  • In this paper, a new adaptive vertex clustering using a KD-tree is presented for 3D mesh partitioning. A vertex clustering is used to divide a huge 3D mesh into several partitions for various mesh processing. An octree-based clustering and K-means clustering are currently leading techniques. However, the octree-based methods practice uniform space divisions and so each partitioned mesh has non-uniformly distributed number of vertices and the difference in its size. The K-means clustering produces uniformly partitioned meshes but takes much time due to many repetitions and optimizations. Therefore, we propose to use a KD-tree to efficiently partition meshes with uniform number of vertices. The bounding box region of the given mesh is adaptively subdivided according to the number of vertices included and dynamically determined axis. As a result, the partitioned meshes have a property of compactness with uniformly distributed vertices.

  • PDF

Study on mapping of dark matter clustering from real space to redshift space

  • Zheng, Yi;Song, Yong-Seon
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.41 no.1
    • /
    • pp.38.2-38.2
    • /
    • 2016
  • The mapping of dark matter clustering from real to redshift spaces introduces the anisotropic property to the measured density power spectrum in redshift space, known as the Redshift Space Distortion (hereafter RSD) effect. The mapping formula is intrinsically non-linear, which is complicated by the higher order polynomials due to the indefinite cross correlations between the density and velocity fields, and the Finger-of-God (hereafter FoG) effect due to the randomness of the peculiar velocity field. Furthermore, the rigorous test of this mapping formula is contaminated by the unknown non-linearity of the density and velocity fields, including their auto- and cross-correlations, for calculating which our theoretical calculation breaks down beyond some scales. Whilst the full higher order polynomials remains unknown, the other systematics can be controlled consistently within the same order truncation in the expansion of the mapping formula, as shown in this paper. The systematic due to the unknown non-linear density and velocity fields is removed by separately measuring all terms in the expansion using simulations. The uncertainty caused by the velocity randomness is controlled by splitting the FoG term into two pieces, 1) the non-local FoG term being independent of the separation vector between two different points, and 2) the local FoG term appearing as an indefinite polynomials which is expanded in the same order as all other perturbative polynomials. Using 100 realizations of simulations, we find that the best fitted non-local FoG function is Gaussian, with only one scale-independent free parameter, and that our new mapping formulation accurately reproduces the observed power spectrum in redshift space at the smallest scales by far, up to k ~ 0.3 h/Mpc, considering the resolution of future experiments.

  • PDF

An Energy-Efficient Clustering Scheme based on Application Layer Data in Wireless Sensor Networks (응용 계층 정보 기반의 에너지 효율적인 센서 네트워크 클러스터링 기법)

  • Kim, Seung-Mok;Lim, Jong-Hyun;Kim, Seung-Hoon
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.7
    • /
    • pp.997-1005
    • /
    • 2009
  • In this paper, we suggest an energy-efficient clustering scheme based on cross-layer design in wireless sensor networks. The proposed scheme works adequately for the characteristic environment of the networks. In the proposed clustering scheme, we separate clusters composed of sensor nodes in the event area from clusters of the other area when an event occurs by using an application layer information. We can save energy from multiple paths through multiple clusters to deliver the same event. We also suggest TDMA scheduling for non-evented clusters. In the scheduling, we allocate one time slot for each node to save energy. The suggested clustering scheme can increase the lifetime of the entire network. We show that our scheme is energy efficient through simulation in terms of the frequency of event occurrences, the event continual time and the scope.

  • PDF

Adaptive Event Clustering for Personalized Photo Browsing (사진 사용 이력을 이용한 이벤트 클러스터링 알고리즘)

  • Kim, Kee-Eung;Park, Tae-Suh;Park, Min-Kyu;Lee, Yong-Beom;Kim, Yeun-Bae;Kim, Sang-Ryong
    • 한국HCI학회:학술대회논문집
    • /
    • 2006.02a
    • /
    • pp.711-716
    • /
    • 2006
  • Since the introduction of digital camera to the mass market, the number of digital photos owned by an individual is growing at an alarming rate. This phenomenon naturally leads to the issues of difficulties while searching and browsing in the personal digital photo archive. Traditional approach typically involves content-based image retrieval using computer vision algorithms. However, due to the performance limitations of these algorithms, at least on the casual digital photos taken by non-professional photographers, more recent approaches are centered on time-based clustering algorithms, analyzing the shot times of photos. These time-based clustering algorithms are based on the insight that when these photos are clustered according to the shot-time similarity, we have "event clusters" that will help the user browse through her photo archive. It is also reported that one of the remaining problems with the time-based approach is that people perceive events in different scales. In this paper, we present an adaptive time-based clustering algorithm that exploits the usage history of digital photos in order to infer the user's preference on the event granularity. Experiments show significant performance improvements in the clustering accuracy.

  • PDF

Fixed Partitioning Methods for Extending lifetime of sensor node for Wireless Sensor Networks (WSN환경에서 센서노드의 생명주기 연장을 위한 고정 분할 기법)

  • Han, Chang-Su;Cho, Young-Bok;Woo, Sung-Hee;Lee, Sang-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.5
    • /
    • pp.942-948
    • /
    • 2016
  • WSN based on wireless sensor nodes, Sensor nodes can not be reassigned and recharged if they once placed. Each sensor node comes into being involved to a communication network with its limited energy. But the existing proposed clustering techniques, being applied to WSN environment with irregular dispersion of sensor nodes, have the network reliability issues which bring about a communication interruption with the local node feature of unbalanced distribution in WSN. Therefore, the communications participation of the sensor nodes in the suggested algorithm is extended by 25% as the sensor field divided in the light of the non-uniformed distribution of sensor nodes and a static or a dynamic clustering algorithm adopted according to its partition of sensor node density in WSN. And the entire network life cycle was extended by 14% to ensure the reliability of the network.

Feature Filtering Methods for Web Documents Clustering (웹 문서 클러스터링에서의 자질 필터링 방법)

  • Park Heum;Kwon Hyuk-Chul
    • The KIPS Transactions:PartB
    • /
    • v.13B no.4 s.107
    • /
    • pp.489-498
    • /
    • 2006
  • Clustering results differ according to the datasets and the performance worsens even while using web documents which are manually processed by an indexer, because although representative clusters for a feature can be obtained by statistical feature selection methods, irrelevant features(i.e., non-obvious features and those appearing in general documents) are not eliminated. Those irrelevant features should be eliminated for improving clustering performance. Therefore, this paper proposes three feature-filtering algorithms which consider feature values per document set, together with distribution, frequency, and weights of features per document set: (l) features filtering algorithm in a document (FFID), (2) features filtering algorithm in a document matrix (FFIM), and (3) a hybrid method combining both FFID and FFIM (HFF). We have tested the clustering performance by feature selection using term frequency and expand co link information, and by feature filtering using the above methods FFID, FFIM, HFF methods. According to the results of our experiments, HFF had the best performance, whereas FFIM performed better than FFID.