• Title/Summary/Keyword: Number of Clusters

Search Result 928, Processing Time 0.033 seconds

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

Association of osteoarthritis and bone mineral density in women -The health and nutritional examination survey in Kuri- (여성의 골관절염과 골밀도간의 관련성 분석 -구리시민 건강.영양진단 조사결과를 바탕으로-)

  • Sheen, Seung-Soo;Lee, Soon-Young;Min, Byung-Hyun;Suh, Il
    • Journal of Preventive Medicine and Public Health
    • /
    • v.30 no.4 s.59
    • /
    • pp.669-685
    • /
    • 1997
  • Previous studies, reporting the inverse relationship between osteoarthritis and osteoporosis suggest the existence of possible pathophysiologic mechanisms between them. To examinine the hypothesis that 'bone mineral densities of women with osteoarthritis are significantly higher than that of women without osteoarthritis in Korea', subjects from the health and nutritional examination survey in Kuri city were sampled. Samples were selected through multi-stage sampling frame using established clusters in Kuri city. From August 18 to September 10,1997, the survey was conducted. Among the. total number of selected sample population (1,656 people), response .ate was 52.4 percent (348 men and 519 women). 420 women who took BMD measurement, radiologic exam, and anthropometric exam were selected for the analysis. The analytic results are as follows. 1. General characteristics: Mean BMD was $0.493g/cm^2$, mean age was 43.0, mean BMI was $23.9kg/m^2$. The number of women who experienced menopause was 106, hysterectomy was 19. There were 0 case of osteoarthritis of hip, 64 cases of osteoarthritis of knee, and 2 cases of osteoarthritis of hand. 2. Univariate analysis results: Mean BMD of women with the osteoarthritis of knee was significantly lower than that of women without the osteoarthritis of knee(0.4269 vs. $0.5057g/cm^2$). But, there were too few cases of osteoarthritis of hip and hand, so comparative studies of BMD in osteoarthritis of hip and hand could not be conducted. There were significant differences of BMD among pre-menopause group(0.5204), post-menopause group(0.4206), and hysterectomy group(0.4881). Additionally, there were significant differences of BMD among diabetes group(0.4297), impaired glucose tolerance group(0.4874), and normal group(0.5057). Furthermore, age, parity, BMI, bioimpedance were significantly related with BMD. 3. Multivariate analysis results: To examinine the relationship between osteoarthritis and BMD while controlling the other variables' effects which were significant in the univariate analyses, multiple linear regression analysis was done. But, it was found that osteoarthritis of knee was not a significant variable to BMD anymore. While age and menopause had significant negative relationship with BMD. Diabetes, parity, BMI, and bioimpedance did not have significant relationships with BMD. After stratification of subjects according to menopause, multiple linear regression analyses were done to each strata. Consequently, age in post-menopause group, age and osteoarthritis of knee in hysterectomy group showed significant negative relationship with BMD. The results did not support the many results of other previous studies done with white men and women. further studies of biological plausibility to Korean women are recommended. Also it is suggested that longitudinal study to verify the relationship between osteoarthritis and BMD will be valuable.

  • PDF

Comparative Molecular Phylogenetic Relationships in Different Strains of Pleurotus spp. (느타리속 버섯 계통의 분자생물학적 유연관계의 비교연구)

  • Cho, Hae-Jin;Lee, Jae-Seong;Yoon, Ki-Nam;Alam, Nuhu;Lee, Kyung-Lim;Shim, Mi-Ja;Lee, Min-Woong;Cheong, Jong-Chun;Shin, Pyung-Gyun;Yoo, Young-Bok;Lee, U-Youn;Lee, Tae-Soo
    • The Korean Journal of Mycology
    • /
    • v.38 no.2
    • /
    • pp.112-119
    • /
    • 2010
  • Pleurotus spp. have been used for edible and medicinal purposes in Asian countries for a long time. The fruiting bodies of the Pleurotus ostreatus, Pleurotus citrinopileatus and Pleurotus salmoneostramineus contained many physiologically beneficial substances for human health. Therefore, it is necessary to study the genetic diversity of Pleurotus mushroom cultivars commercially cultivated in Korea. Eleven strains of Pleurotus spp. were collected from different geographical regions in South-East Asia and ITS regions of rDNA and RAPD of genomic DNA were analyzed. The size of the ITS1 and ITS2 regions of rDNA from the different strains varied from 167 to 254 bp and 156 to 213 bp, respectively. The sequence of ITS1 was more variable than that of ITS2, and the 5.8S sequences were identical. A phylogenetic tree based on the ITS region sequences indicated that selected strains could be classified into 4 clusters. Eleven Pleurotus species were also analyzed by RAPD with 20 arbitrary primers. Ten of these primers were efficiently amplified the genomic DNA. The number of amplified bands varied with the primers and strains, with polymorphic fragments in the range from 0.1 to 2.0kb. The results revealed that genetic diversity of selected strains of P. ostreatus, P. citrinopileatus and P. salmoneostramineus is low.

THE INFLUENCE OF pH AND LACTIC ACID CONCENTRATION ON THE FORMATION OF ARTIFICIAL ROOT CARIES IN ACID BUFFER SOLUTION (산 완충용액의 pH 및 유산의 농도가 인공치근우식의 형성에 미치는 영향)

  • Oh, Hyun-Suk;Roh, Byoung-Duck;Lee, Chan-Young
    • Restorative Dentistry and Endodontics
    • /
    • v.32 no.1
    • /
    • pp.47-60
    • /
    • 2007
  • The purpose of this study is to compare and to evaluate the effect of pH and lactic acid concentration on the progression of artificial root caries lesion using polarizing microscope, and to evaluate the morphological changes of hydroxyapatite crystals of the demineralized area and to investigate the process of demineralization using scanning electron microscope. Artificial root caries lesion was created by dividing specimens into 3 pH groups (pH 4.3, 5.0, 5.5), and each pH group was divided into 3 lactic acid concentration groups (25 mM, 50 mM, 100 mM). Each group was immersed in acid buffer solution for 5 days and examined. The results were as follows : 1. Under polarized microscope, the depth of lesion was more effected by the lactic acid concentration rather than the pH. 2. Under scanning electron microscope, dissolution of hydroxyapatite crystals were increased as the lactic acid concentration increased and the pH decreased. 3. Demineralized hydroxyapatite crystals showed peripheral dissolution and decreased size and number within cluster of hydroxyapatite crystals and widening of intercluster and intercrystal spaces as the pH decreased and the lactic acid concentration increased. 4. Under scanning electron microscope evaluation of the surface zone, clusters of hydroxyapatite crystals were dissolved, and dissolution and reattachment of crystals on the surface of collagen fibrils were observed as the lactic acid concentration increased. 5. Under scanning electron microscope, demineralizatlon of dentin occurred not only independently but also with remineralization simultaneously. In conclusion, the study showed that pH and lactic acid concentration influenced the rate of progression of the lesion in artificial root caries. Demineralization process was progressed from the surface of the cluster of hydroxyapatite crystals and the morphology of hydroxyapatite crystals changed from round or elliptical shape into irregular shape as time elapsed.

The Effect of Surface Defects on the Cyclic Fatigue Fracture of HEROShaper Ni-Ti rotary files in a Dynamic Model: A Fractographic Analysis (Fractographic 분석을 통한 HEROShaper 니켈티타늄 전동 파일의 피로파절에 미치는 표면결함의 역할)

  • Lee, Jung-Kyu;Kim, Eui-Sung;Kang, Myoung-Whai;Kum, Kee-Yeon
    • Restorative Dentistry and Endodontics
    • /
    • v.32 no.2
    • /
    • pp.130-137
    • /
    • 2007
  • This in vitro study examined the effect of surface defects on cutting blades on the extent of the cyclic fatigue fracture of HEROShaper Ni-Ti rotary files using fractographic analysis of the fractured surfaces. A total of 45 HEROShaper (MicroMega) Ni-Ti rotary flies with a #30/.04 taper were divided into three groups of 15 each. Group 1 contained new HEROShapers without any surface defects. Group 2 contained HEROShapers with manufacturing defects such as metal rollover and machining marks. Croup 3 contained HEROShapers that had been clinically used for the canal preparation of 4-6 molars A fatigue-testing device was designed to allow cyclic tension and compressive stress on the tip of the instrument whilst maintaining similar conditions to those experienced in a clinic. The level of fatigue fracture time was measured using a computer connected the system. Statistical analysis was performed using a Tukey's test. Scanning electron microscopy (SEM) was used for fractographic analysis of the fractured surfaces. The fatigue fracture time between groups 1 and 2, and between groups 1 and 3 was significantly different (p<0.05) but there was no significant difference between groups 2 and 3 (p>0.05). A low magnification SEM views show brittle fracture as the main initial failure mode At higher magnification, the brittle fracture region showed clusters of fatigue striations and a large number of secondary cracks. These fractures typically led to a central region of catastrophic ductile failure. Qualitatively, the ductile fracture region was characterized by the formation of microvoids and dimpling. The fractured surfaces of the HEROShapers in groups 2 and 3 were always associated with pre-existing surface defects. Typically, the fractured surface in the brittle fracture region showed evidence of cleavage (transgranular) facets across the grains, as well as intergranular facets along the grain boundaries. These results show that surface defects on cutting blades of Ni-Ti rotary files might be the preferred sites for the origin of fatigue fracture under experimental conditions. Furthermore this work demonstrates the utility of fractography in evaluating the failure of Ni-Ti rotary flies.

Estimating genetic diversity and population structure of 22 chicken breeds in Asia using microsatellite markers

  • Roh, Hee-Jong;Kim, Seung-Chang;Cho, Chang-Yeon;Lee, Jinwook;Jeon, Dayeon;Kim, Dong-kyo;Kim, Kwan-Woo;Afrin, Fahmida;Ko, Yeoung-Gyu;Lee, Jun-Heon;Batsaikhan, Solongo;Susanti, Triana;Hegay, Sergey;Kongvongxay, Siton;Gorkhali, Neena Amatya;Thi, Lan Anh Nguyen;Thao, Trinh Thi Thu;Manikku, Lakmalie
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.33 no.12
    • /
    • pp.1896-1904
    • /
    • 2020
  • Objective: Estimating the genetic diversity and structures, both within and among chicken breeds, is critical for the identification and conservation of valuable genetic resources. In chickens, microsatellite (MS) marker polymorphisms have previously been widely used to evaluate these distinctions. Our objective was to analyze the genetic diversity and relationships among 22 chicken breeds in Asia based on allelic frequencies. Methods: We used 469 genomic DNA samples from 22 chicken breeds from eight Asian countries (South Korea, KNG, KNB, KNR, KNW, KNY, KNO; Laos, LYO, LCH, LBB, LOU; Indonesia, INK, INS, ING; Vietnam, VTN, VNH; Mongolia, MGN; Kyrgyzstan, KGPS; Nepal, NPS; Sri Lanka, SBC) and three imported breeds (RIR, Rhode Island Red; WLG, White Leghorn; CON, Cornish). Their genetic diversity and phylogenetic relationships were analyzed using 20 MS markers. Results: In total, 193 alleles were observed across all 20 MS markers, and the number of alleles ranged from 3 (MCW0103) to 20 (LEI0192) with a mean of 9.7 overall. The NPS breed had the highest expected heterozygosity (Hexp, 0.718±0.027) and polymorphism information content (PIC, 0.663±0.030). Additionally, the observed heterozygosity (Hobs) was highest in LCH (0.690±0.039), whereas WLG showed the lowest Hexp (0.372±0.055), Hobs (0.384±0.019), and PIC (0.325±0.049). Nei's DA genetic distance was the closest between VTN and VNH (0.086), and farthest between KNG and MGN (0.503). Principal coordinate analysis showed similar results to the phylogenetic analysis, and three axes explained 56.2% of the variance (axis 1, 19.17%; 2, 18.92%; 3, 18.11%). STRUCTURE analysis revealed that the 22 chicken breeds should be divided into 20 clusters, based on the highest ΔK value (46.92). Conclusion: This study provides a basis for future genetic variation studies and the development of conservation strategies for 22 chicken breeds in Asia.

Analysis of Genetic Characteristics and Probability of Individual Discrimination in Korean Indigenous Chicken Brands by Microsatellite Marker (MS 마커를 이용한 토종닭 브랜드의 유전적 특성 및 개체 식별력 분석)

  • Suh, Sangwon;Cho, Chang-Yeon;Kim, Jae-Hwan;Choi, Seong-Bok;Kim, Young-Sin;Kim, Hyun;Seong, Hwan-Hoo;Lim, Hyun-Tae;Cho, Jae-Hyeon;Ko, Yeoung-Gyu
    • Journal of Animal Science and Technology
    • /
    • v.55 no.3
    • /
    • pp.185-194
    • /
    • 2013
  • Microsatellite markers have been a useful genetic tool in determining diversity, relationships and individual discrimination studies of livestock. The level of genetic diversity, relationships among two Korean indigenous chicken brand populations (Woorimatdag: WR, Hanhyup3: HH) as well as two pure populations (White Leghorn: WL, Rhode Island Red: RIR) were analyzed, based on 26 MS markers. A total of 191 distinct alleles were observed across the four chicken populations, and 47 (24.6%) of these alleles were unique to only one population. The mean $H_{Exp}$ and PIC were estimated as 0.667 and 0.630. Nei's $D_A$ genetic distance and factorial correspondence analysis (FCA) showed that the four populations represented four distinct groups. However, the genetic distance between each Korean indigenous chicken brand (WR, HH) and the pure population (WL, RIR) were threefold that among the WR and HH. For the STRUCTURE analyses, the most appropriate number of clusters for modeling the data was determined to be three. The expected probabilities of identity among genotypes of random individuals (PI) were calculated as $1.17{\times}10^{-49}$ (All 26 markers) and $1.14{\times}10^{-15}$, $7.33{\times}10^{-20}$ (9, 12 with the highest PI value, respectively). The results indicated that the brand chicken breed traceability system employing the own highest PI value 9 to 12 markers, and might be applicable to individual identification of Korean indigenous chicken brand.

Comparison of the Plant Characteristics and Nutritional Components between GM and Non-GM Chinese Cabbages Grown in the Central and Northern Parts of Korea (중·북부지역에서 재배된 GM 배추와 Non-GM 배추간의 식물체 특성 및 영양 성분 비교 분석)

  • Cho, Dong-Wook;Oh, Jin-Pyo;Park, Kuen-Woo;Lee, Dong-Jin;Chung, Kyu-Hwan
    • Horticultural Science & Technology
    • /
    • v.28 no.5
    • /
    • pp.836-844
    • /
    • 2010
  • This study was carried out to investigate plant characteristics and nutritional components of the genetically modified (GM) Chinese cabbage and its control line grown in the central and northern parts of Korea in order to establish the evaluating protocol and standard assessment. The GM and non-GM Chinese cabbage was planted with normal and concentrated density at two locations in spring and fall of 2008 and 2009. From the statistic analysis on plant characteristics and nutritional components, there were not many significant differences between GM and non-GM Chinese cabbage. Only few differences in the plant characteristics were found between the dense and normal planting. In the dense planting, there was no significant difference between GM and non-GM Chinese cabbages except for three out of 18 plant traits, such as leaf shape, hairiness and midrib length. On the other hand, nine plant traits including leaf length, leaf width, leaf color, leaf shape, fresh weigh of ground part, number of leaf, midrib length, midrib width and root diameter were slightly different between GM and non-GM Chinese cabbage in the normal planting. In case of leaf length, midrib length, midrib width and fresh weigh of ground part, there were significantly differences not only between two lines, but also between two locations. From nutritional component analysis, only five fatty acids were identified in the Chinese cabbage: palmitic acid, oleic acid, stearic acid, linoleic acid and linolenic acid. Except linoleic acid, four fatty acids in one gram of dried sample from GM line were little higher than those from non-GM line. However, there were no significant differences in total contents of fatty acids not only between GM and non-GM Chinese cabbage line, but also between northern and central cultivating areas in the normal and dense planting. According to the composition of inorganic elements identified in the samples from both lines, there were six macro-elements, such as N, P, Ca, K, Mg and Na, and four micro-elements, Cu, Fe, Mn and Zn. Based on the result from PCA analysis, specific clusters were not found between GM Chinese cabbage and the control line, but found between two regions.

Effect of Day/Night Temperatures during Seedling Culture on the Growth and Nodes of Early Flower Cluster Set of 'Seokwang' Tomato (Lycopersicum esculentum Mill.) (육묘시의 주야간 기온이 서광 토마토의 생육 및 초기 착화 절위에 미치는 영향)

  • 김오임;정병룡
    • Journal of Bio-Environment Control
    • /
    • v.8 no.2
    • /
    • pp.75-82
    • /
    • 1999
  • This study was carried out to examine the effect of day/nignt temperatures during seedling culture on the vegetative and reproductive growth of Lycopersicum esculentum ‘Seokwang’. The study was consisted of two culture stages, plug seedling production in the growth chamber and hydroponic culture of the plant in a glasshouse. Experiments were replicated over time. The germinated seedlings were raised for 33 days (experiment 1) and 35 days (experiment 2) in 4 growth chambers, each with day/night temperatures of either $25^{\circ}C$/$25^{\circ}C$, 16$^{\circ}C$/16$^{\circ}C$, 16$^{\circ}C$/$25^{\circ}C$ or $25^{\circ}C$/16$^{\circ}C$. Cool-white fluorescent lamps provided 140$\mu$mol.m$^{-2}$ .s$^{-1}$ light for 12h each day. In the second experiment, all chambers were supplied with 1000$\mu$mol.mol$^{-1}$ CO$_{2}$ during the photoperiod and had an air velocity of 0.3m.s$^{-1}$ and relative humidity of 80%. Plug seedlings raised were transplanted to rockwool slabs in a glasshouse and were grown hydroponically using the same nutrient solutions used for seedling culture for 37 days (experiment 1) and 35 days (experiment 2). Plant height was affected more by mean daily temperature than by interaction of day and night temperatures. Plant height was the highest in 16/16$^{\circ}C$ treatment. Leaf count was not affected by day and night temperatures, and the chlorophyll concentration was the highest in 16/$25^{\circ}C$ treatment. Fresh and dry weights of stem tended to be greater in treatments of constant day and night temperature. The number of node on which first and second flower clusters were set was significantly higher in 25/$25^{\circ}C$ treatment than in the other treatments. Days to flower of the first flower on the first flower cluster were the greatest in 25/$25^{\circ}C$ and the least in 16/$25^{\circ}C$ treatment. Vegetative and reproductive growth, such as height, fresh and dry weights, days to flower, and nodes of the 1st and 2nd flower cluster set were affected by day/night temperatures.

  • PDF

Genetic Differences and Variation in Two Largehead Hairtail (Trichiurus lepturus) Populations Determined by RAPD-PCR Analysis (RAPD-PCR 분석에 의해 결정된 갈치 (Trichiurus lepturus) 2 집단의 유전적 차이와 변이)

  • Park, Chang-Yi;Yoon, Jong-Man
    • Korean Journal of Ichthyology
    • /
    • v.17 no.3
    • /
    • pp.173-186
    • /
    • 2005
  • Genomic DNA was isolated from two geographic populations of largehead hairtail (Trichiurus lepturus) in Korea and the Atlantic Ocean. The eight arbitrarily selected primers were found to generate common, polymorphic, and specific fragments. The complexity of the banding patterns varied dramatically between primers from the two locations. The size of the DNA fragments also varied widely, from 150 bp (base pairs) to 3,000 bp. Here, 947 fragments were identified in the largehead hairtail population from Korea, and 642 in the largehead hairtail population from the Atlantic Ocean: 148 specific fragments (15.6%) in the Korean population, and 61 (9.5%) in the Atlantic population. In the Korean population, 638 common fragments with an average of 79.8 per primer were observed.; 429 common fragments, with an average of 53.6 per primer, were identified in the Atlantic population. The number of polymorphic fragments in the largehead hairtail population from Korea and the Atlantic Ocean was 76 and 27, respectively. Based on the average bandsharing values of all samples, the similarity matrix ranged from 0.784 to 0.922 in the Korean population, and from 0.833 to 0.990 in the Atlantic population. The bandsharing value of individuals within the Atlantic population was much higher than in the Korean population. The dendrogram obtained by the eight primers indicated two genetic clusters: cluster 1 (KOREAN 01~KOREAN 11), and cluster 2 (ATLANTIC 12~ATLANTIC 22). Individual KOREAN no. 10 from Korea was genetically most closely related to KOREAN no. 11 in the Korean population (genetic distance = 0.038). Ultimately, individual KOREAN no. 01 of the Korean population was most distantly related to ATLANTIC no. 16 of the Atlantic population (genetic distance = 0.708).