• Title/Summary/Keyword: Clustered data

Search Result 544, Processing Time 0.026 seconds

A Secure, Hierarchical and Clustered Multipath Routing Protocol for Homogenous Wireless Sensor Networks: Based on the Numerical Taxonomy Technique

  • Hossein Jadidoleslamy
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.8
    • /
    • pp.121-136
    • /
    • 2023
  • Wireless Sensor Networks (WSNs) have many potential applications and unique challenges. Some problems of WSNs are: severe resources' constraints, low reliability and fault tolerant, low throughput, low scalability, low Quality of Service (QoS) and insecure operational environments. One significant solution against mentioned problems is hierarchical and clustering-based multipath routing. But, existent algorithms have many weaknesses such as: high overhead, security vulnerabilities, address-centric, low-scalability, permanent usage of optimal paths and severe resources' consumption. As a result, this paper is proposed an energy-aware, congestion-aware, location-based, data-centric, scalable, hierarchical and clustering-based multipath routing algorithm based on Numerical Taxonomy technique for homogenous WSNs. Finally, performance of the proposed algorithm has been compared with performance of LEACH routing algorithm; results of simulations and statistical-mathematical analysis are showing the proposed algorithm has been improved in terms of parameters like balanced resources' consumption such as energy and bandwidth, throughput, reliability and fault tolerant, accuracy, QoS such as average rate of packet delivery and WSNs' lifetime.

Application of Spatial Autocorrelation for the Spatial Distribution Pattern Analysis of Marine Environment - Case of Gwangyang Bay - (해양환경 공간분포 패턴 분석을 위한 공간자기상관 적용 연구 - 광양만을 사례 지역으로 -)

  • Choi, Hyun-Woo;Kim, Kye-Hyun;Lee, Chul-Yong
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.10 no.4
    • /
    • pp.60-74
    • /
    • 2007
  • For quantitative analysis of spatio-temporal distribution pattern on marine environment, spatial autocorrelation statistics on the both global and local aspects was applied to the observed data obtained from Gwangyang Bay in South Sea of Korea. Global indexes such as Moran's I and General G were used for understanding environmental distribution pattern in the whole study area. LISAs (local indicators of spatial association) such as Moran's I ($I_i$) and $G_i{^*}$ were considered to find similarity between a target feature and its neighborhood features and to detect hot spot and/or cold spot. Additionally, the significance test on clustered patterns by Z-scores was carried out. Statistical results showed variations of spatial patterns quantitatively in the whole year. Then all of general water quality, nutrients, chlorophyll-a and phytoplankton had strong clustered pattern in summer. When global indexes showed strong clustered pattern, the front region with a negative $I_i$ which means a strong spatial variation was observed. Also, when global indexes showed random pattern, hot spot and/or cold spot were/was found in the small local region with a local index $G_i{^*}$. Therefore, global indexes were useful for observing the strength and time series variations of clustered patterns in the whole study area, and local indexes were useful for tracing the location of hot spot and/or cold spot. Quantification of both spatial distribution pattern and clustering characteristics may play an important role to understand marine environment in depth and to find the reasons for spatial pattern.

  • PDF

Clustering Observations for Detecting Multiple Outliers in Regression Models

  • Seo, Han-Son;Yoon, Min
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.3
    • /
    • pp.503-512
    • /
    • 2012
  • Detecting outliers in a linear regression model eventually fails when similar observations are classified differently in a sequential process. In such circumstances, identifying clusters and applying certain methods to the clustered data can prevent a failure to detect outliers and is computationally efficient due to the reduction of data. In this paper, we suggest to implement a clustering procedure for this purpose and provide examples that illustrate the suggested procedure applied to the Hadi-Simonoff (1993) method, reverse Hadi-Simonoff method, and Gentleman-Wilk (1975) method.

Modelling Count Responses with Overdispersion

  • Jeong, Kwang Mo
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.6
    • /
    • pp.761-770
    • /
    • 2012
  • We frequently encounter outcomes of count that have extra variation. This paper considers several alternative models for overdispersed count responses such as a quasi-Poisson model, zero-inflated Poisson model and a negative binomial model with a special focus on a generalized linear mixed model. We also explain various goodness-of-fit criteria by discussing their appropriateness of applicability and cautions on misuses according to the patterns of response categories. The overdispersion models for counts data have been explained through two examples with different response patterns.

Statistical Analysis of Clustered Interval-Censored Data with Informative Cluster Size (정보적군집 크기를 가진 군집화된 구간 중도절단자료 분석을 위한결합모형의 적용)

  • Kim, Yang-Jin;Yoo, Han-Na
    • Communications for Statistical Applications and Methods
    • /
    • v.17 no.5
    • /
    • pp.689-696
    • /
    • 2010
  • Interval-censored data are commonly found in studies of diseases that progress without symptoms, which require clinical evaluation for detection. Several techniques have been suggested with independent assumption. However, the assumption will not be valid if observations come from clusters. Furthermore, when the cluster size relates to response variables, commonly used methods can bring biased results. For example, in a study on lymphatic filariasis, a parasitic disease where worms make several nests in the infected person's lymphatic vessels and reside until adulthood, the response variable of interest is the nest-extinction times. Since the extinction times of nests are checked by repeated ultrasound examinations, exact extinction times are not observed. Instead, data are composed of two examination points: the last examination time with living worms and the first examination time with dead worms. Furthermore, as Williamson et al. (2008) pointed out, larger nests show a tendency for low clearance rates. This association has been denoted as an informative cluster size. To analyze the relationship between the numbers of nests and interval-censored nest-extinction times, this study proposes a joint model for the relationship between cluster size and clustered interval-censored failure data.

Pan-Genomics of Lactobacillus plantarum Revealed Group-Specific Genomic Profiles without Habitat Association

  • Choi, Sukjung;Jin, Gwi-Deuk;Park, Jongbin;You, Inhwan;Kim, Eun Bae
    • Journal of Microbiology and Biotechnology
    • /
    • v.28 no.8
    • /
    • pp.1352-1359
    • /
    • 2018
  • Lactobacillus plantarum is a lactic acid bacterium that promotes animal intestinal health as a probiotic and is found in a wide variety of habitats. Here, we investigated the genomic features of different clusters of L. plantarum strains via pan-genomic analysis. We compared the genomes of 108 L. plantarum strains that were available from the NCBI GenBank database. These genomes were 2.9-3.7 Mbp in size and 44-45% in G+C content. A total of 8,847 orthologs were collected, and 1,709 genes were identified to be shared as core genes by all the strains analyzed. On the basis of SNPs from the core genes, 108 strains were clustered into five major groups (G1-G5) that are different from previous reports and are not clearly associated with habitats. Analysis of group-specific enriched or depleted genes revealed that G1 and G2 were rich in genes for carbohydrate utilization (${\text\tiny{L}}-arabinose$, ${\text\tiny{L}}-rhamnose$, and fructooligosaccharides) and that G3, G4, and G5 possessed more genes for the restriction-modification system and MazEF toxin-antitoxin. These results indicate that there are critical differences in gene content and survival strategies among genetically clustered L. plantarum strains, regardless of habitats.

Design of Advanced Metering Infrastructure Network Based on Multi-Channel Cluster (다중채널 클러스터 기반의 AMI 네트워크 설계)

  • Choi, Seok-Jun;Shim, Byoung-Sup;Chae, Soo-Kwon
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38B no.3
    • /
    • pp.207-215
    • /
    • 2013
  • This paper is channel assignment and scheduling techniques for efficient wireless AMI network. In AMI system, the multi-channel cluster network to be proposed defines the communication channel between NC (Network Coordinator) and CDA (Clustered Data Aggregator) as the network channel. CDA and OMD(Out Meter display) and communication channel between SMD(Smart Meter Device) are defined as the group channel. AMI network of the multi-channel cluster based in which the network channel and group channel is mixed increases the administration efficiency through the physical/logical consumer channel clustering. The reliability of inspection data through the channel use distinguished between the adjacent cluster is enhanced. In addition, the fast aggregation of data is possible and the size of a metering network is increased through the channel allocation of the multichannel cluster based.

Re-evaluation of Obesity Syndrome Differentiation Questionnaire Based on Real-world Survey Data Using Data Mining (데이터 마이닝을 이용한 한의비만변증 설문지 재평가: 실제 임상에서 수집한 설문응답 기반으로)

  • Oh, Jihong;Wang, Jing-Hua;Choi, Sun-Mi;Kim, Hojun
    • Journal of Korean Medicine for Obesity Research
    • /
    • v.21 no.2
    • /
    • pp.80-94
    • /
    • 2021
  • Objectives: The purpose of this study is to re-evaluate the importance of questions of obesity syndrome differentiation (OSD) questionnaire based on real-world survey and to explore the possibility of simplifying OSD types. Methods: The OSD frequency was identified, and variance threshold feature selection was performed to filter the questions. Filtered questions were clustered by K-means clustering and hierarchical clustering. After principal component analysis (PCA), the distribution patterns of the subjects were identified and the differences in the syndrome distribution were compared. Results: The frequency of OSD in spleen deficiency, phlegm (PH), and blood stasis (BS) was lower than in food retention (FR), liver qi stagnation (LS), and yang deficiency. We excluded 13 questions with low variance, 7 of which were related to BS. Filtered questions were clustered into 3 groups by K-means clustering; Cluster 1 (17 questions) mainly related to PH, BS syndromes; Cluster 2 (11 questions) related to swelling, and indigestion; Cluster 3 (11 questions) related to overeating or emotional symptoms. After PCA, significant different patterns of subjects were observed in the FR, LS, and other obesity syndromes. The questions that mainly affect the FR distribution were digestive symptoms. And emotional symptoms mainly affect the distribution of LS subjects. And other obesity syndrome was partially affected by both digestive and emotional symptoms, and also affected by symptoms related to poor circulation. Conclusions: In-depth data mining analysis identified relatively low importance questions and the potential to simplify OSD types.

Evaluation of Clustered Building Solid Model Automatic Generation Technique and Model Editing Function Based on Point Cloud Data (포인트 클라우드 데이터 기반 군집형 건물 솔리드 모델 자동 생성 기법과 모델 편집 기능 평가)

  • Kim, Han-gyeol;Lim, Pyung-Chae;Hwang, Yunhyuk;Kim, Dong Ha;Kim, Taejung;Rhee, Sooahm
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.6_1
    • /
    • pp.1527-1543
    • /
    • 2021
  • In this paper, we explore the applicability and utility of a technology that generating clustered solid building models based on point cloud automatically by applying it to various data. In order to improve the quality of the model of insufficient quality due to the limitations of the automatic building modeling technology, we develop the building shape modification and texture correction technology and confirmed the resultsthrough experiments. In order to explore the applicability of automatic building model generation technology, we experimented using point cloud and LiDAR (Light Detection and Ranging) data generated based on UAV, and applied building shape modification and texture correction technology to the automatically generated building model. Then, experiments were performed to improve the quality of the model. Through this, the applicability of the point cloud data-based automatic clustered solid building model generation technology and the effectiveness of the model quality improvement technology were confirmed. Compared to the existing building modeling technology, our technology greatly reduces costs such as manpower and time and is expected to have strengths in the management of modeling results.

Performance Enhancement of a DVA-tree by the Independent Vector Approximation (독립적인 벡터 근사에 의한 분산 벡터 근사 트리의 성능 강화)

  • Choi, Hyun-Hwa;Lee, Kyu-Chul
    • The KIPS Transactions:PartD
    • /
    • v.19D no.2
    • /
    • pp.151-160
    • /
    • 2012
  • Most of the distributed high-dimensional indexing structures provide a reasonable search performance especially when the dataset is uniformly distributed. However, in case when the dataset is clustered or skewed, the search performances gradually degrade as compared with the uniformly distributed dataset. We propose a method of improving the k-nearest neighbor search performance for the distributed vector approximation-tree based on the strongly clustered or skewed dataset. The basic idea is to compute volumes of the leaf nodes on the top-tree of a distributed vector approximation-tree and to assign different number of bits to them in order to assure an identification performance of vector approximation. In other words, it can be done by assigning more bits to the high-density clusters. We conducted experiments to compare the search performance with the distributed hybrid spill-tree and distributed vector approximation-tree by using the synthetic and real data sets. The experimental results show that our proposed scheme provides consistent results with significant performance improvements of the distributed vector approximation-tree for strongly clustered or skewed datasets.