• Title/Summary/Keyword: Data Clustering

Search Result 2,769, Processing Time 0.036 seconds

Unsupervised Outpatients Clustering: A Case Study in Avissawella Base Hospital, Sri Lanka

  • Hoang, Huu-Trung;Pham, Quoc-Viet;Kim, Jung Eon;Kim, Hoon;Park, Junseok;Hwang, Won-Joo
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.4
    • /
    • pp.480-490
    • /
    • 2019
  • Nowadays, Electronic Medical Record (EMR) has just implemented at few hospitals for Outpatient Department (OPD). OPD is the diversified data, it includes demographic and diseases of patient, so it need to be clustered in order to explore the hidden rules and the relationship of data types of patient's information. In this paper, we propose a novel approach for unsupervised clustering of patient's demographic and diseases in OPD. Firstly, we collect data from a hospital at OPD. Then, we preprocess and transform data by using powerful techniques such as standardization, label encoder, and categorical encoder. After obtaining transformed data, we use some strong experiments, techniques, and evaluation to select the best number of clusters and best clustering algorithm. In addition, we use some tests and measurements to analyze and evaluate cluster tendency, models, and algorithms. Finally, we obtain the results to analyze and discover new knowledge, meanings, and rules. Clusters that are found out in this research provide knowledge to medical managers and doctors. From these information, they can improve the patient management methods, patient arrangement methods, and doctor's ability. In addition, it is a reference for medical data scientist to mine OPD dataset.

An Optimization Method for the Calculation of SCADA Main Grid's Theoretical Line Loss Based on DBSCAN

  • Cao, Hongyi;Ren, Qiaomu;Zou, Xiuguo;Zhang, Shuaitang;Qian, Yan
    • Journal of Information Processing Systems
    • /
    • v.15 no.5
    • /
    • pp.1156-1170
    • /
    • 2019
  • In recent years, the problem of data drifted of the smart grid due to manual operation has been widely studied by researchers in the related domain areas. It has become an important research topic to effectively and reliably find the reasonable data needed in the Supervisory Control and Data Acquisition (SCADA) system has become an important research topic. This paper analyzes the data composition of the smart grid, and explains the power model in two smart grid applications, followed by an analysis on the application of each parameter in density-based spatial clustering of applications with noise (DBSCAN) algorithm. Then a comparison is carried out for the processing effects of the boxplot method, probability weight analysis method and DBSCAN clustering algorithm on the big data driven power grid. According to the comparison results, the performance of the DBSCAN algorithm outperforming other methods in processing effect. The experimental verification shows that the DBSCAN clustering algorithm can effectively screen the power grid data, thereby significantly improving the accuracy and reliability of the calculation result of the main grid's theoretical line loss.

Approximate fuzzy clustering based on a density function (밀도 함수를 이용한 근사적 퍼지 클러스터링)

  • 손세호;권순학;최윤혁
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2000.05a
    • /
    • pp.94-97
    • /
    • 2000
  • We introduce an approximate fuzzy clustering method, which is simple but computationally efficient, based on density functions in this paper. The density functions are defined by the number of data within the predetermined interval. Numerical examples are presented to show the validity of the proposed clustering method.

  • PDF

Min-Distance Hop Count based Multi-Hop Clustering In Non-uniform Wireless Sensor Networks

  • Kim, Eun-Ju;Kim, Dong-Joo;Park, Jun-Ho;Seong, Dong-Ook;Lee, Byung-Yup;Yoo, Jae-Soo
    • International Journal of Contents
    • /
    • v.8 no.2
    • /
    • pp.13-18
    • /
    • 2012
  • In wireless sensor networks, an energy efficient data gathering scheme is one of core technologies to process a query. The cluster-based data gathering methods minimize the energy consumption of sensor nodes by maximizing the efficiency of data aggregation. However, since the existing clustering methods consider only uniform network environments, they are not suitable for the real world applications that sensor nodes can be distributed unevenly. To solve such a problem, we propose a balanced multi-hop clustering scheme in non-uniform wireless sensor networks. The proposed scheme constructs a cluster based on the logical distance to the cluster head using a min-distance hop count. To show the superiority of our proposed scheme, we compare it with the existing clustering schemes in sensor networks. Our experimental results show that our proposed scheme prolongs about 48% lifetime over the existing methods on average.

Practical Data Transmission in Cluster-Based Sensor Networks

  • Kim, Dae-Young;Cho, Jin-Sung;Jeong, Byeong-Soo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.4 no.3
    • /
    • pp.224-242
    • /
    • 2010
  • Data routing in wireless sensor networks must be energy-efficient because tiny sensor nodes have limited power. A cluster-based hierarchical routing is known to be more efficient than a flat routing because only cluster-heads communicate with a sink node. Existing hierarchical routings, however, assume unrealistically large radio transmission ranges for sensor nodes so they cannot be employed in real environments. In this paper, by considering the practical transmission ranges of the sensor nodes, we propose a clustering and routing method for hierarchical sensor networks: First, we provide the optimal ratio of cluster-heads for the clustering. Second, we propose a d-hop clustering scheme. It expands the range of clusters to d-hops calculated by the ratio of cluster-heads. Third, we present an intra-cluster routing in which sensor nodes reach their cluster-heads within d-hops. Finally, an inter-clustering routing is presented to route data from cluster-heads to a sink node using multiple hops because cluster-heads cannot communicate with a sink node directly. The efficiency of the proposed clustering and routing method is validated through extensive simulations.

A Clustering Scheme Considering the Structural Similarity of Metadata in Smartphone Sensing System (스마트폰 센싱에서 메타데이터의 구조적 유사도를 고려한 클러스터링 기법)

  • Min, Hong;Heo, Junyoung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.6
    • /
    • pp.229-234
    • /
    • 2014
  • As association between sensor networks that collect environmental information by using numberous sensor nodes and smartphones that are equipped with various sensors, many applications understanding users' context have been developed to interact users and their environments. Collected data should be stored with XML formatted metadata containing semantic information to share the collected data. In case of distance based clustering schemes, the efficiency of data collection decreases because metadata files are extended and changed as the purpose of each system developer. In this paper, we proposed a clustering scheme considering the structural similarity of metadata to reduce clustering construction time and improve the similarity of metadata among member nodes in a cluster.

An Efficient Clustering Scheme Considering Node Density in Wireless Sensor Networks (무선 센서 네트워크에서 노드 밀도를 고려한 효율적인 클러스터링 기법)

  • Kim, Chang-Hyeon;Lee, Won-Joo;Jeon, Chang-Ho
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.46 no.4
    • /
    • pp.79-86
    • /
    • 2009
  • In this paper, we propose a new clustering scheme that provides optimal data aggregation effect and reduces energy consumption of nodes by considering the density of nodes when forming clusters. Since the size of the cluster is determined to ensure optimal data aggregation rate, our scheme reduces transmission range and minimizes interference between clusters. Moreover, by clustering using locally adjacent nodes and aggregating data received from cluster members, we reduce energy consumption of nodes. Through simulation, we confirmed that energy consumption of the whole network is minimized and the sensor network life-time is extended. Moreover, we show that the proposed clustering scheme improves the performance of network compared to previous LEACH clustering scheme.

Clustering of Incomplete Data Using Autoencoder and fuzzy c-Means Algorithm (AutoEncoder와 FCM을 이용한 불완전한 데이터의 군집화)

  • 박동철;장병근
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.5C
    • /
    • pp.700-705
    • /
    • 2004
  • Clustering of incomplete data using the Autoencoder and the Fuzzy c-Means(PCM) is proposed in this paper. The Proposed algorithm, called Optimal Completion Autoencoder Fuzzy c-Means(OCAEFCM), utilizes the Autoencoder Neural Network (AENN) and the Gradiant-based FCM (GBFCM) for optimal completion of missing data and clustering of the reconstructed data. The proposed OCAEFCM is applied to the IRIS data and a data set from a financial institution to evaluate the performance. When compared with the existing Optimal Completion Strategy FCM (OCSFCM), the OCAEFCM shows 18%-20% improvement of performance over OCSFCM.

Tree-Dependent Components of Gene Expression Data for Clustering (유전자발현데이터의 군집분석을 위한 나무 의존 성분 분석)

  • Kim Jong-Kyoung;Choi Seung-Jin
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06a
    • /
    • pp.4-6
    • /
    • 2006
  • Tree-dependent component analysis (TCA) is a generalization of independent component analysis (ICA), the goal of which is to model the multivariate data by a linear transformation of latent variables, while latent variables fit by a tree-structured graphical model. In contrast to ICA, TCA allows dependent structure of latent variables and also consider non-spanning trees (forests). In this paper, we present a TCA-based method of clustering gene expression data. Empirical study with yeast cell cycle-related data, yeast metaboiic shift data, and yeast sporulation data, shows that TCA is more suitable for gene clustering, compared to principal component analysis (PCA) as well as ICA.

  • PDF

Program Development of Integrated Expression Profile Analysis System for DNA Chip Data Analysis (DNA칩 데이터 분석을 위한 유전자발연 통합분석 프로그램의 개발)

  • 양영렬;허철구
    • KSBB Journal
    • /
    • v.16 no.4
    • /
    • pp.381-388
    • /
    • 2001
  • A program for integrated gene expression profile analysis such as hierarchical clustering, K-means, fuzzy c-means, self-organizing map(SOM), principal component analysis(PCA), and singular value decomposition(SVD) was made for DNA chip data anlysis by using Matlab. It also contained the normalization method of gene expression input data. The integrated data anlysis program could be effectively used in DNA chip data analysis and help researchers to get more comprehensive analysis view on gene expression data of their own.

  • PDF