• Title/Summary/Keyword: population based data

Search Result 2,391, Processing Time 0.028 seconds

Two Phase Hierarchical Clustering Algorithm for Group Formation in Data Mining (데이터 마이닝에서 그룹 세분화를 위한 2단계 계층적 글러스터링 알고리듬)

  • 황인수
    • Korean Management Science Review
    • /
    • v.19 no.1
    • /
    • pp.189-196
    • /
    • 2002
  • Data clustering is often one of the first steps in data mining analysis. It Identifies groups of related objects that can be used as a starling point for exploring further relationships. This technique supports the development of population segmentation models, such as demographic-based customer segmentation. This paper Purpose to present the development of two phase hierarchical clustering algorithm for group formation. Applications of the algorithm for product-customer group formation in customer relationahip management are also discussed. As a result of computer simulations, suggested algorithm outperforms single link method and k-means clustering.

Generalization of modified systematic sampling and regression estimation for population with a linear trend (선형추세를 갖는 모집단에 대한 변형계통표집의 일반화와 회귀추정법)

  • Kim, Hyuk-Joo;Kim, Jeong-Hyeon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.6
    • /
    • pp.1103-1118
    • /
    • 2009
  • When we wish to estimate the mean or total of a finite population, the numbering of the population units is of importance. In this paper, we have proposed two methods for estimating the mean or total of a population having a linear trend, for the case when the reciprocal of the sampling fraction is an even number and the sample size is an odd number. The first method involves drawing a sample by using a method which is a generalization of Singh et al's (1968) modified systematic sampling, and using interpolation in determining the estimator. The second method involves selecting a sample by modified systematic sampling, and estimating the population parameters by the regression estimation method. Under the criterion of the expected mean square error based on Cochran's (1946) infinite superpopulation model, the proposed methods have been compared with existing methods. We have also made a comparison between the two proposed methods.

  • PDF

Estimation of effective population size using single-nucleotide polymorphism (SNP) data in Jeju horse

  • Do, Kyoung-Tag;Lee, Joon-Ho;Lee, Hak-Kyo;Kim, Jun;Park, Kyung-Do
    • Journal of Animal Science and Technology
    • /
    • v.56 no.8
    • /
    • pp.28.1-28.6
    • /
    • 2014
  • This study was conducted to estimate the effective population size using SNPs data of 240 Jeju horses that had raced at the Jeju racing park. Of the total 61,746 genotyped autosomal SNPs, 17,320 (28.1%) SNPs (missing genotype rate of >10%, minor allele frequency of <0.05 and Hardy-Weinberg equilibrium test P-value of < $10^{-6}$) were excluded after quality control processes. SNPs on the X and Y chromosomes and genotyped individuals with missing genotype rate over 10% were also excluded, and finally, 44,426 (71.9%) SNPs were selected and used for the analysis. The measures of the LD, square of correlation coefficient ($r^2$) between SNP pairs, were calculated for each allele and the effective population size was determined based on $r^2$ measures. The polymorphism information contents (PIC) and expected heterozygosity (HE) were 0.27 and 0.34, respectively. In LD, the most rapid decline was observed over the first 1 Mb. But $r^2$ decreased more slowly with increasing distance and was constant after 2 Mb of distance and the decline was almost linear with log-transformed distance. The average $r^2$ between adjacent SNP pairs ranged from 0.20 to 0.31 in each chromosome and whole average was 0.26, while the whole average $r^2$ between all SNP pairs was 0.02. We observed an initial pattern of decreasing $N_e$ and estimated values were closer to 41 at 1 ~ 5 generations ago. The effective population size (41 heads) estimated in this study seems to be large considering Jeju horse's population size (about 2,000 heads), but it should be interpreted with caution because of the technical limitations of the methods and sample size.

A Strategy for Multi-target Paths Coverage by Improving Individual Information Sharing

  • Qian, Zhongsheng;Hong, Dafei;Zhao, Chang;Zhu, Jie;Zhu, Zhanggeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.11
    • /
    • pp.5464-5488
    • /
    • 2019
  • The multi-population genetic algorithm in multi-target paths coverage has become a top choice for many test engineers. Also, information sharing strategy can improve the efficiency of multi-population genetic algorithm to generate multi-target test data; however, there is still space for some improvements in several aspects, which will affect the effectiveness of covering the target path set. Therefore, a multi-target paths coverage strategy is proposed by improving multi-population genetic algorithm based on individual information sharing among populations. It primarily contains three aspects. Firstly, the behavior of the sub-population covering corresponding target path is improved, so that it can continue to try to cover other sub-paths after covering the current target path, so as to take full advantage of population resources; Secondly, the populations initialized are prioritized according to the matching process, so that those sub-populations with better path coverage rate are executed firstly. Thirdly, for difficultly-covered paths, the individual chromosome features which can cover the difficultly-covered paths are extracted by utilizing the data generated, so as to screen those individuals who can cover the difficultly-covered paths. In the experiments, several benchmark programs were employed to verify the accuracy of the method from different aspects and also compare with similar methods. The experimental results show that it takes less time to cover target paths by our approach than the similar ones, and achieves more efficient test case generation process. Finally, a plug-in prototype is given to implement the approach proposed.

Factors Affecting Dental Utilization and Dental Expenses in the Economically Active Population: Based on the 2010~2014 Korea Health Panel Data

  • Lee, Jin-Ha;Ahn, Eunsuk
    • Journal of dental hygiene science
    • /
    • v.19 no.1
    • /
    • pp.23-30
    • /
    • 2019
  • Background: The health of the economically active population contributes to increased corporate productivity by reducing the productivity loss caused by disease and increasing job efficiency, which in turn is a national benefit. Since the economically active population is a concept encompassing workers and a source of economic development for a country, that population's health should be treated with importance not only from a personal standpoint but also at a national level. Methods: In this study, data of 11,007 adults aged 20 years and older who participate in economic activities were analyzed in the five-year Korea Health Panel Study from 2010 to 2014 including the number of dental visits and dental medical expenses. Results: Factors related to "gender," "education level," "age," "duty category," "income level," "employment type," "national health insurance," and "chronic disease status" of the economically active population are affected in relation to the number of visits and dental medical expenses. The number of dental visits increased with higher education levels (p<0.001), and the number of visits to the dentist increased with older age (p <0.001). Dental medical expenses were 91,806 Korean won (KRW) more for "white-collar workers" than for "blue-collar workers" (p<0.03), and 127,674 KRW more for "regular workers" than for "atypical workers" (p<0.02). Conclusion: When it is necessary to improve policies to enhance the efficiency of the distribution of health and medical resources in the overall balance of the dental health sector, we should try to identify various factors of oral health disorder due to income inequality among the classes according to the country's employment type in order to find ways to reduce the health gap among the social classes.

Comparative assessment of the effective population size and linkage disequilibrium of Karan Fries cattle revealed viable population dynamics

  • Shivam Bhardwaj;Oshin Togla;Shabahat Mumtaz;Nistha Yadav;Jigyasha Tiwari;Lal Muansangi;Satish Kumar Illa;Yaser Mushtaq Wani;Sabyasachi Mukherjee;Anupama Mukherjee
    • Animal Bioscience
    • /
    • v.37 no.5
    • /
    • pp.795-806
    • /
    • 2024
  • Objective: Karan Fries (KF), a high-producing composite cattle was developed through crossing indicine Tharparkar cows with taurine bulls (Holstein Friesian, Brown Swiss, and Jersey), to increase the milk yield across India. This composite cattle population must maintain sufficient genetic diversity for long-term development and breed improvement in the coming years. The level of linkage disequilibrium (LD) measures the influence of population genetic forces on the genomic structure and provides insights into the evolutionary history of populations, while the decay of LD is important in understanding the limits of genome-wide association studies for a population. Effective population size (Ne) which is genomically based on LD accumulated over the course of previous generations, is a valuable tool for e valuation of the genetic diversity and level of inbreeding. The present study was undertaken to understand KF population dynamics through the estimation of Ne and LD for the long-term sustainability of these breeds. Methods: The present study included 96 KF samples genotyped using Illumina HDBovine array to estimate the effective population and examine the LD pattern. The genotype data were also obtained for other crossbreds (Santa Gertrudis, Brangus, and Beefmaster) and Holstein Friesian cattle for comparison purposes. Results: The average LD between single nucleotide polymorphisms (SNPs) was r2 = 0.13 in the present study. LD decay (r2 = 0.2) was observed at 40 kb inter-marker distance, indicating a panel with 62,765 SNPs was sufficient for genomic breeding value estimation in KF cattle. The pedigree-based Ne of KF was determined to be 78, while the Ne estimates obtained using LD-based methods were 52 (SNeP) and 219 (genetic optimization for Ne estimation), respectively. Conclusion: KF cattle have an Ne exceeding the FAO's minimum recommended level of 50, which was desirable. The study also revealed significant population dynamics of KF cattle and increased our understanding of devising suitable breeding strategies for long-term sustainable development.

Analysis of Relationship between the Spatial Characteristics of the Elderly Population Distribution and Heat Wave based on GIS - focused on Changwon City - (GIS 기반 노인인구 분포지역의 공간적 특성과 폭염의 관계 분석 - 창원시를 대상으로 -)

  • SONG, Bong-Geun;PARK, Kyung-Hun;KIM, Gyeong-Ah;KIM, Seoung-Hyeon;Park, Geon-Ung;MUN, Han-Sol
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.23 no.3
    • /
    • pp.68-84
    • /
    • 2020
  • This study analyzed the relationship between spatial characteristics and heat waves in the distribution area of the elderly population in Changwon, Gyeongsangnam-do. For analysis, the Statistics Census data, the Ministry of Environment land cover, Landsat 8 surface temperature, and the Meteorological Agency's heat wave days data were used. The spatial characteristics of the distribution of the elderly population was classified into 5 types through K-mean cluster analysis considering the land use types. The characteristics of the elderly population by spatial type were higher in the urbanized type(cluster-3), but the proportion of the elderly population was higher in the agricultural and forest area types(cluster-1, cluster-2). In the characteristics of the surface temperature and the heat wave days, the surface temperature was the highest in the urban area, but heat wave days were the highest in the rural area. As a result of analyzing the heat wave characteristics according to the spatial type of the distribution area of elderly population, cluster-2 with the largest area in agricultural areas was highest at 15.95 days, and cluster-3 with a large area in urbanized types was the lowest at 9.41 days and 9.18 days. In other words, the elderly population living in rural areas is more exposed to heat waves than the elderly population living in urban areas, and the damage is expected to increase. The results of this study could be used as basic data to prepare various policy measures for effective management and prevention of vulnerable areas in summer.

A Comparison of Systematic Sampling Designs for Forest Inventory

  • Yim, Jong Su;Kleinn, Christoph;Kim, Sung Ho;Jeong, Jin-Hyun;Shin, Man Yong
    • Journal of Korean Society of Forest Science
    • /
    • v.98 no.2
    • /
    • pp.133-141
    • /
    • 2009
  • This study was conducted to support for determining an efficient sampling design for forest resources assessments in South Korea with respect to statistical efficiency. For this objective, different systematic sampling designs were simulated and compared based on an artificial forest population that had been built from field sample data and satellite data in Yang-Pyeong County, Korea. Using the k-NN technique, two thematic maps (growing stock and forest cover type per pixel unit) across the test area were generated; field data (n=191) and Landsat ETM+ were used as source data. Four sampling designs (systematic sampling, systematic sampling for post-stratification, systematic cluster sampling, and stratified systematic sampling) were employed as optimum sampling design candidates. In order to compute error variance, the Monte Carlo simulation was used (k=1,000). Then, sampling error and relative efficiency were compared. When the objective of an inventory was to obtain estimations for the entire population, systematic cluster sampling was superior to the other sampling designs. If its objective is to obtain estimations for each sub-population, post-stratification gave a better estimation. In order to successfully perform this procedure, it requires clear definitions of strata of interest per field observation unit for efficient stratification.

A Basic Study on The Architectural Characteristic of the Group Home in Japan (일본 그룹 홈의 건축적 특징에 관한 기초연구)

  • Yang, Yoon-Sil;Kim, Tae-Il
    • Korean Institute of Interior Design Journal
    • /
    • v.23 no.4
    • /
    • pp.248-256
    • /
    • 2014
  • According to data from the national office of Statistics Korea and Ministry of Health and Welfare, as the elderly population increases, the dementia elderly population continues to increase and its future population growth rate is expected to be even faster. In particular, the Dementia Management Act has been in effect since February 2012, and active efforts has been made for a policy for the dementia management. The purpose of this study is to establish standards on building plans based on the appropriate scale and spatial configuration on facilities planning for the elderly with dementia. Specifically, the basic data were collected with a request for a total of 103 points on the basis of a database of group homes in the survey managed by the Japan Association of Group Homes. Specific information of the research includes the management body of facilities operation, scale of the facilities, number of units and configuration of personal living space, and the collected survey data and drawings were statistically proceed and analyzed using the SPSS WIN 20.0. analysis results are summarized as follows. first, most of the group homes come to the small size of the 1-2 story home ; the approximate number of units is one or two per home, and each unit consists of nine rooms. second, a number of group homes with the building area of $300m^2$ have the U-shaped arrangement which is advantageous in the extension and facilities maintenance. In conclusion, this study is to be the fundamental data for judgments that can be used to establish standards for the facilities for the dementia elderly whose population continues to increase. In addition, further study is necessary to establish suitable design conditions of our country.

The Forecasting about the Numbers of the Third Graders in a High-school until 2022 Year in Daegu City

  • Kim, Jong-Tae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.933-942
    • /
    • 2005
  • Recently, the decrease of the number of the third graders in a high-school have serious influences on the number of a limit matriculation of colleges and universities. The purpose of this paper is to forecast for the number of a high-school graduate until 2022 year in Daegu city as based on the resident registration population. As the based period of 2004, most college and universities in Daegu city have to reduce the 37.5% of the number of limit matriculation until 2022 year to equal the number of the third graders in a high-school.

  • PDF