• Title/Summary/Keyword: Combined dataset

Search Result 158, Processing Time 0.029 seconds

Phrase-based Topic and Sentiment Detection and Tracking Model using Incremental HDP

  • Chen, YongHeng;Lin, YaoJin;Zuo, WanLi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.12
    • /
    • pp.5905-5926
    • /
    • 2017
  • Sentiments can profoundly affect individual behavior as well as decision-making. Confronted with the ever-increasing amount of review information available online, it is desirable to provide an effective sentiment model to both detect and organize the available information to improve understanding, and to present the information in a more constructive way for consumers. This study developed a unified phrase-based topic and sentiment detection model, combined with a tracking model using incremental hierarchical dirichlet allocation (PTSM_IHDP). This model was proposed to discover the evolutionary trend of topic-based sentiments from online reviews. PTSM_IHDP model firstly assumed that each review document has been composed by a series of independent phrases, which can be represented as both topic information and sentiment information. PTSM_IHDP model secondly depended on an improved time-dependency non-parametric Bayesian model, integrating incremental hierarchical dirichlet allocation, to estimate the optimal number of topics by incrementally building an up-to-date model. To evaluate the effectiveness of our model, we tested our model on a collected dataset, and compared the result with the predictions of traditional models. The results demonstrate the effectiveness and advantages of our model compared to several state-of-the-art methods.

Morphological and genetic diversity of Euglena deses group (Euglenophyceae) with emphasis on cryptic species

  • Kim, Jong Im;Linton, Eric W.;Shin, Woongghi
    • ALGAE
    • /
    • v.31 no.3
    • /
    • pp.219-230
    • /
    • 2016
  • The Euglena deses group are common freshwater species composed of E. adhaerens, E. carterae, E. deses, E. mutabilis, and E. satelles. These species are characterized by elongated cylindrical worm-like cell bodies and numerous discoid chloroplasts with a naked pyrenoid. To understand the cryptic diversity, species delimitation and phylogenetic relationships among members of the group, we analyzed morphological data (light and scanning electron microscopy) and molecular data (nuclear small subunit [SSU] and large subunit [LSU] rDNAs and plastid SSU and LSU rDNAs). Bayesian and maximum likelihood analyses based on the combined four-gene dataset resulted in a tree consisting of two major clades within the group. The first clade was composed of two subclades: the E. mutabilis subclade, and the E. satelles, E. carterae, and E. adhaerens subclade. The E. mutabilis subclade was characterized by a lateral canal opening at the anterior end and a single pellicular stria, whereas the E. satelles, E. carterae, and E. adhaerens subclade was characterized by an apical canal opening at the anterior end of the cell and double pellicular striae. The second clade consisted of 20 strains of E. deses, characterizing by a subapical canal opening at the anterior end and double pellicular striae, but they showed cell size variation and high genetic diversity. Species boundaries were tested using a Bayesian multi-locus species delimitation method, resulting in the recognition of five cryptic species within E. deses clade.

Identification and Characterization of Macrophomina phaseolina Causing Leaf Blight on White Spider Lilies (Crinum asiaticum and Hymenocallis littoralis) in Malaysia

  • Huda-Shakirah, Abd Rahim;Kee, Yee Jia;Hafifi, Abu Bakar Mohd;Azni, Nurul Nadiah Mohamad;Zakaria, Latiffah;Mohd, Masratul Hawa
    • Mycobiology
    • /
    • v.47 no.4
    • /
    • pp.408-414
    • /
    • 2019
  • Crinum asiaticum and Hymenocallis littoralis, commonly known as spider lilies are bulbous perennial and herbaceous plants that widely planted in Malaysia as ornamental. During 2015-2016, symptom of leaf blight was noticed on the hosts from several locations in Penang. The symptom appeared as irregular brown to reddish lesions surrounded by yellow halos. As the disease progressed, the infected leaves became blighted, dried, and fell off with the presence of black microsclerotia and pycnidia on the lesions parts. The present study was conducted to investigate the causal pathogen of leaf blight on C. asiaticum and H. littoralis. Based on morphological characteristics and DNA sequences of internal transcribed spacer (ITS) region and translation elongation factor 1-alpha (TEF1-α) gene, the causal pathogen was identified as Macrophomina phaseolina. Phylogenetic analysis of combined dataset of ITS and TEF1-α grouped the isolates studied with other isolates of M. phaseolina from GenBank. The grouping of the isolates was supported by 96% bootstrap value. Pathogenicity test proved the role of the fungus in causing leaf blight on both hosts.

Characterization and Pathogenicity of New Record of Anthracnose on Various Chili Varieties Caused by Colletotrichum scovillei in Korea

  • Oo, May Moe;Lim, GiTaek;Jang, Hyun A;Oh, Sang-Keun
    • Mycobiology
    • /
    • v.45 no.3
    • /
    • pp.184-191
    • /
    • 2017
  • The anthracnose disease caused by Colletotrichum species is well-known as a major plant pathogen that primarily causes fruit rot in pepper and reduces its marketability. Thirty-five isolates representing species of Colletotrichum were obtained from chili fruits showing anthracnose disease symptoms in Chungcheongnam-do and Chungcheongbuk-do, South Korea. These 35 isolates were characterized according to morphological characteristics and nucleotide sequence data of internal transcribed spacer, glyceraldehyde-3-phosphate-dehydrogenase, and ${\beta}$-tubulin. The combined dataset shows that all of these 35 isolates were identified as C. scovillei and morphological characteristics were directly correlated with the nucleotide sequence data. Notably, these isolates were recorded for the first time as the causes of anthracnose caused by C. scovillei on pepper in Korea. Forty cultivars were used to investigate the pathogenicity and to identify the possible source of resistance. The result reveals that all of chili cultivars used in this study are susceptible to C. scovillei.

Characterization and Pathogenicity of Alternaria burnsii from Seeds of Cucurbita maxima (Cucurbitaceae) in Bangladesh

  • Paul, Narayan Chandra;Deng, Jian Xin;Lee, Hyang Burm;Yu, Seung-Hun
    • Mycobiology
    • /
    • v.43 no.4
    • /
    • pp.384-391
    • /
    • 2015
  • In the course of survey of endophytic fungi from Bangladesh pumpkin seeds in 2011~2012, two strains (CNU111042 and CNU111043) with similar colony characteristics were isolated and characterized by their morphology and by molecular phylogenetic analysis of the internal transcribed spacer, glyceraldehydes-3-phosphate dehydrogenase (gpd), and Alternaria allergen a1 (Alt a1) sequences. Phylogenetic analysis of all three sequences and their combined dataset revealed that the fungus formed a subclade within the A. alternata clade, matching A. burnsii and showing differences with its other closely related Alternaria species, such as A. longipes, A. tomato, and A. tomaticola. Long ellipsoid, obclavate or ovoid beakless conidia, shorter and thinner conidial size ($16{\sim}60[90]{\times}6.5{\sim}14[{\sim}16]{\mu}m$) distinguish this fungus from other related species. These isolates showed more transverse septation (2~11) and less longitudinal septation (0~3) than did other related species. Moreover, the isolate did not produce any diffusible pigment on media. Therefore, our results reveal that the newly recorded fungus from a new host, Cucurbita maxima, is Alternaria burnsii Uppal, Patel & Kamat.

A Clustering Algorithm for Sequence Data Using Rough Set Theory (러프 셋 이론을 이용한 시퀀스 데이터의 클러스터링 알고리즘)

  • Oh, Seung-Joon;Park, Chan-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.2
    • /
    • pp.113-119
    • /
    • 2008
  • The World Wide Web is a dynamic collection of pages that includes a huge number of hyperlinks and huge volumes of usage informations. The resulting growth in online information combined with the almost unstructured web data necessitates the development of powerful web data mining tools. Recently, a number of approaches have been developed for dealing with specific aspects of web usage mining for the purpose of automatically discovering user profiles. We analyze sequence data, such as web-logs, protein sequences, and retail transactions. In our approach, we propose the clustering algorithm for sequence data using rough set theory. We present a simple example and experimental results using a splice dataset and synthetic datasets.

  • PDF

Joint Identification of Multiple Genetic Variants of Obesity in a Korean Genome-wide Association Study

  • Oh, So-Hee;Cho, Seo-Ae;Park, Tae-Sung
    • Genomics & Informatics
    • /
    • v.8 no.3
    • /
    • pp.142-149
    • /
    • 2010
  • In recent years, genome-wide association (GWA) studies have successfully led to many discoveries of genetic variants affecting common complex traits, including height, blood pressure, and diabetes. Although GWA studies have made much progress in finding single nucleotide polymorphisms (SNPs) associated with many complex traits, such SNPs have been shown to explain only a very small proportion of the underlying genetic variance of complex traits. This is partly due to that fact that most current GWA studies have relied on single-marker approaches that identify single genetic factors individually and have limitations in considering the joint effects of multiple genetic factors on complex traits. Joint identification of multiple genetic factors would be more powerful and provide a better prediction of complex traits, since it utilizes combined information across variants. Recently, a new statistical method for joint identification of genetic variants for common complex traits via the elastic-net regularization method was proposed. In this study, we applied this joint identification approach to a large-scale GWA dataset (i.e., 8842 samples and 327,872 SNPs) in order to identify genetic variants of obesity for the Korean population. In addition, in order to test for the biological significance of the jointly identified SNPs, gene ontology and pathway enrichment analyses were further conducted.

First Report of Diaporthe tectonae Isolated from Soil in Korea (토양에서 분리한 Diaporthe tectonae에 대한 보고)

  • Park, Sangkyu;Lee, Seung-Yeol;Lee, Jae-Jin;Back, Chang-Gi;Lee, Hyang Burm;Jung, Hee-Young
    • The Korean Journal of Mycology
    • /
    • v.45 no.1
    • /
    • pp.83-89
    • /
    • 2017
  • An unrecorded fungal species in Korea, Diaporthe tectonae was isolated from soil in Jeon-ju of Korea. The isolate was characterized morphologically, and a phylogenetic analysis using a combined dataset of internal transcribed spacer, ${\beta}-tubulin$, and elongation factor $1-{\alpha}$ sequences indicated its similarity to D. tectonae strains reported previously. To our knowledge, this is the first report of D. tectonae in Korea.

First Report of Dieback Caused by Lasiodiplodia theobromae in Strawberry Plants in Korea

  • Nam, Myeong Hyeon;Park, Myung Soo;Kim, Hyun Sook;Kim, Tae il;Lee, Eun Mo;Park, Jong Dae;Kim, Hong Gi
    • Mycobiology
    • /
    • v.44 no.4
    • /
    • pp.319-324
    • /
    • 2016
  • Dieback in strawberry (Seolhyang cultivar) was first observed during the nursery season (June to September) in the Nonsan area of Korea in the years 2012 and 2013. Initial disease symptoms included dieback on runners, as well as black rot on roots, followed by wilting and eventually blackened, necrotic discoloration in the crowns of daughter plants. A fungus isolated from the diseased roots, runners, and crowns is close to Lasiodiplodia theobromae based on morphological characteristics. Analysis of a combined dataset assembled from sequences of the internal transcribed spacer and translation elongation factor 1- alpha genes grouped nine fungal isolates with the type strain of L. theobromae. The isolates showed strong pathogenicity on strawberry cultivars Kumhyang, Seolhyang, and Akihimae, fulfilling Koch's postulates. Based on these results, the pathogen responsible for dieback on strawberry plants in Korea was identified as L. theobromae.

Discovering Gene-Environment Interactions in the Post-Genomic Era

  • Naidoo, Nirinjini;Chia, Kee-Seng
    • Journal of Preventive Medicine and Public Health
    • /
    • v.42 no.6
    • /
    • pp.356-359
    • /
    • 2009
  • In the more than 100 genome wide association studies (GWAS) conducted in the past 5 years, more than 250 genetic loci contributing to more than 40 common diseases and traits have been identified. Whilst many genes have been linked to a trait, both their individual and combined effects are small and unable to explain earlier estimates of heritability. Given the rapid changes in disease incidence that cannot be accounted for by changes in diagnostic practises, there is need to have well characterized exposure information in addition to genomic data for the study of gene-environment interactions. The case-control and cohort study designs are most suited for studying associations between risk factors and occurrence of an outcome. However, the case control study design is subject to several biases and hence the preferred choice of the prospective cohort study design in investigating geneenvironment interactions. A major limitation of utilising the prospective cohort study design is the long duration of follow-up of participants to accumulate adequate outcome data. The GWAS paradigm is a timely reminder for traditional epidemiologists who often perform one- or few-at-a-time hypothesis-testing studies with the main hallmarks of GWAS being the agnostic approach and the massive dataset derived through large-scale international collaborations.