Search | Korea Science

Classification and Analysis of Data Mining Algorithms (데이터마이닝 알고리즘의 분류 및 분석)

Lee, Jung-Won;Kim, Ho-Sook;Choi, Ji-Young;Kim, Hyon-Hee;Yong, Hwan-Seung;Lee, Sang-Ho;Park, Seung-Soo
- Journal of KIISE:Databases
- /
- v.28 no.3
- /
- pp.279-300
- /
- 2001
Data mining plays an important role in knowledge discovery process and usually various existing algorithms are selected for the specific purpose of the mining. Currently, data mining techniques are actively to the statistics, business, electronic commerce, biology, and medical area and currently numerous algorithms are being researched and developed for these applications. However, in a long run, only a few algorithms, which are well-suited to specific applications with excellent performance in large database, will survive. So it is reasonable to focus our effort on those selected algorithms in the future. This paper classifies about 30 existing algorithms into 7 categories - association rule, clustering, neural network, decision tree, genetic algorithm, memory-based reasoning, and bayesian network. First of all, this work analyzes systematic hierarchy and characteristics of algorithms and we present 14 criteria for classifying the algorithms and the results based on this criteria. Finally, we propose the best algorithms among some comparable algorithms with different features and performances. The result of this paper can be used as a guideline for data mining researches as well as field applications of data mining.
PDF

Microbial Forensics: Comparison of MLVA Results According to NGS Methods, and Forensic DNA Analysis Using MLVA (미생물법의학: 차세대염기서열분석 방법에 따른 MLVA 결과 비교 및 이를 활용한 DNA 감식)

Hyeongseok Yun;Seungho Lee;Seunghyun Lim;Daesang Lee;Sehun Gu;Jungeun Kim;Juhwan Jeong;Seongjoo Kim;Gyeunghaeng Hur;Donghyun Song
- Journal of the Korea Institute of Military Science and Technology
- /
- v.27 no.4
- /
- pp.507-515
- /
- 2024
Microbial forensics is a scientific discipline for analyzing evidence related to biological crimes by identifying the origin of microorganisms. Multiple locus variable number tandem repeat analysis(MLVA) is one of the microbiological analysis methods used to specify subtypes within a species based on the number of tandem repeat in the genome, and advances in next generation sequencing(NGS) technology have enabled in silico anlysis of full-length whole genome sequences. In this paper, we analyzed unknown samples provided by Robert Koch Institute(RKI) through The United Nations Secretary-General's Mechanism(UNSGM)'s external quality assessment exercise(EQAE) project, which we officially participated in 2023. We confirmed that the 3 unknown samples were B. anthracis through nucleic acid isolation and genetic sequence analysis studies. MLVA results on 32 loci of B. anthracis were analysed by using genome sequences obtained from NGS(NextSeq and MinION) and Sanger sequencing. The MLVA typing using short-reads based NGS platform(NextSeq) showed a high probability of causing assembly error when a size of the tandem repeats was grater than 200 bp, while long-reads based NGS platform(MinION) showed higher accuracy than NextSeq, although insertion and deletion was observed. We also showed hybrid assembly can correct most indel error caused by MinION. Based on the MLVA results, genetic identification was performed compared to the 2,975 published MLVA databases of B. anthracis, and MLVA results of 10 strains were identical with 3 unkonwn samples. As a result of whole genome alignment of the 10 strains and 3 unknown samples, all samples were identified as B. anthracis strain A4564 which is associated with injectional anthrax isolates in heroin users.
https://doi.org/10.9766/KIMST.2024.27.4.507 인용 PDF

An Updated Pooled Analysis of Glutathione S-transferase Genotype Polymorphisms and Risk of Adult Gliomas

Yao, Lei;Ji, Guixiang;Gu, Aihua;Zhao, Peng;Liu, Ning
- Asian Pacific Journal of Cancer Prevention
- /
- v.13 no.1
- /
- pp.157-163
- /
- 2012
Objective: Glutathione S-transferases (GSTs) are multifunctional enzymes that play a crucial role in the detoxification of both the endogenous products of oxidative stress and exogenous carcinogens. Recent studies investigating the association between genetic polymorphisms in GSTs and the risk of adult brain tumors have reported conflicting results. The rationale of this pooled analysis was to determine whether the presence of a GST variant increases adult glioma susceptibility by combining data from multiple studies. Methods: In our meta-analysis, 12 studies were identified by a search of the MEDLINE, HIGHWIRE, SCIENCEDIRECT and EMBASE databases. Of those 12, 11 evaluated GSTM1, nine evaluated GSTT1 and seven evaluated GSTP1 Ile105Val. Between-study heterogeneity was assessed using ${\chi}^2$-based Q statistic and the $I^2$ statistic. Crude odds ratios (ORs) with corresponding 95% confidence intervals (CIs) were used to estimate the association between GSTM1, GSTT1 and GSTP1 polymorphisms and the risk of adult gliomas. Results: The quantitative synthesis showed no significant evidence to indicate an association exists between the presence of a GSTM1, GSTT1 or GSTP1 Ile105Val haplotype polymorphism and the risk of adult gliomas (OR, 1.008, 1.246, 1.061 respectively; 95% CI, 0.901-1.129, 0.963-1.611, 0.653-1.724 respectively). Conclusions: Overall, this study did not suggest any strong relationship between GST variants or related enzyme polymorphisms and an increased risk of adult gliomas. Some caveats include absence of specific raw information on ethnic groups or smoking history on glioma cases in published articles; therefore, well-designed studies with a clear stratified analysis on potential confounding factors are needed to confirm these results.
https://doi.org/10.7314/APJCP.2012.13.1.157 인용 PDF KSCI

Association Between TP53 Arg72Pro Polymorphism and Hepatocellular Carcinoma Risk: A Meta-analysis

Xu, Chang-Tao;Zheng, Fang;Dai, Xin;Du, Ji-Dong;Liu, Hao-Run;Zhao, Li;Li, Wei-Min
- Asian Pacific Journal of Cancer Prevention
- /
- v.13 no.9
- /
- pp.4305-4309
- /
- 2012
Background: Previous studies on the association between the TP53 Arg72Pro polymorphism and hepatocellular carcinoma (HCC) risk obtained controversial findings. This study aimed to quantify the strength of the association by meta-analysis. Methods: We searched PubMed and Wangfang databases for published studies on the association between the TP53 Arg72Pro polymorphism and HCC risk, using the pooled odds ratio (OR) with its 95% confidence intervals (95% CI) for assessment. Results: 10 studies with a total of 2,026 cases and 2,733 controls were finally included into this meta-analysis. Overall, the TP53 Arg72Pro polymorphism was not associated with HCC risk (all P values greaterth HCC risk in Caucasians in three genetic models (For Pro versus Arg, OR = 1.20, 95%CI 1.03-1.41; For ProPro versus ArgArg, OR = 1.74, 95%CI 1.23-2.47; For ProPro versus ArgPro/ArgArg, OR = 1.85, 95%CI 1.33-2.57). However, there was no significant association between the TP53 Arg72Pro polymorphism and HCC risk in East Asians (all P values greater than 0.10). No evidence of publication bias was observed. Conclusion: Meta-analyses of available data suggest an obvious association between the TP53 Arg72Pro and HCC risk in Caucasians. However, the TP53 Arg72Pro polymorphism may have a race-specific effect on HCC risk and further studies are needed to elucidate this possible effect.
https://doi.org/10.7314/APJCP.2012.13.9.4305 인용 PDF KSCI

Association between the NQO1 C609T Polymorphism with Hepatocellular Carcinoma Risk in the Chinese Population

Zhao, Hong;Zou, Li-Wei;Zheng, Sui-Sheng;Geng, Xiao-Ping
- Asian Pacific Journal of Cancer Prevention
- /
- v.16 no.5
- /
- pp.1821-1825
- /
- 2015
Background: Associations between the NQO1 C609T polymorphism and hepatocellular carcinoma (HCC) risk are a subject of debate. We therefore performed the present meta-analysis to evaluate links with HCC susceptibility. Materials and Methods: Several major databases (PubMed, EBSCO), the Chinese national knowledge infrastructure (CNKI) and the Wanfang database were searched for eligible studies. Crude odds ratios (ORs) with 95% confidence intervals (CIs) were used to measure the strength of associations. Results: A total of 4 studies including 1,325 patients and 1,367 controls were identified. There was a significant association between NQO1 C609T polymorphism and HCC for all genetic models (allelic model: OR=1.45, 95%CI=1.23-1.72, p<0.01; additive model: OR=1.96, 95%CI=1.57-2.43, p<0.01; dominant model: OR=1.62, 95%CI=1.38-1.91, p<0.01; and recessive model: OR=1.53, 95%CI=1.26-1.84, p<0.01). On subgroup analysis, similarly results were identified in Asians. For Asians, the combined ORs and 95% CIs were (allelic model: OR=1.50, 95%CI=1.24-1.82, p<0.01; additive model: OR=2.11, 95%CI=1.48-3.01, p<0.01; dominant model: OR=1.69, 95%CI=1.42-2.02, p<0.01; and recessive model: OR=1.59, 95%CI=1.16-2.19, p<0.01). Conclusions: The current meta-analysis suggested that the NQO1 C609T polymorphism could be a risk factor for developing HCC, particularly in the Chinese population.
https://doi.org/10.7314/APJCP.2015.16.5.1821 인용 PDF KSCI

An Approach for a Substitution Matrix Based on Protein Blocks and Physicochemical Properties of Amino Acids through PCA

You, Youngki;Jang, Inhwan;Lee, Kyungro;Kim, Heonjoo;Lee, Kwanhee
- Interdisciplinary Bio Central
- /
- v.6 no.4
- /
- pp.3.1-3.10
- /
- 2014
Amino acid substitution matrices are essential tools for protein sequence analysis, homology sequence search in protein databases and multiple sequence alignment. The PAM matrix was the first widely used amino acid substitution matrix. The BLOSUM series then succeeded the PAM matrix. Most substitution matrixes were developed by using the statistical frequency of substitution between each amino acid at blocks representing groups of protein families or related proteins. However, substitution of amino acids is based on the similarity of physiochemical properties of each amino acid. In this study, a new approach was used to obtain major physiochemical properties in multiple sequence alignment. Frequency of amino acid substitution in multiple sequence alignment database and selected attributes of amino acids in physiochemical properties database were merged. This merged data showed the major physiochemical properties through principle components analysis. Using factor analysis, these four principle components were interpreted as flexibility of electronic movement, polarity, negative charge and structural flexibility. Applying these four components, BAPS was constructed and validated for accuracy. When comparing receiver operated characteristic ($ROC_{50}$) values, BAPS scored slightly lower than BLOSUM and PAM. However, when evaluating for accuracy by comparing results from multiple sequence alignment with the structural alignment results of two test data sets with known three-dimensional structure in the homologous structure alignment database, the result of the test for BAPS was comparatively equivalent or better than results for prior matrices including PAM, Gonnet, Identity and Genetic code matrix.
https://doi.org/10.4051/ibc.2014.6.4.0003 인용 PDF KSCI

Hordein Fingerprinting for Cultivar Discrimination in National List of Barley (Hordein 분석을 통한 보리 국가목록등재품종의 품종식별)

소은희;고은별;최수정;이종호;송인호
- KOREAN JOURNAL OF CROP SCIENCE
- /
- v.49 no.3
- /
- pp.256-260
- /
- 2004
A major challenge facing those involved in the testing of new plant varieties for distinctness, uniformity and stability (DUS) is the need to compare new varieties against all those of common knowlege (reference varieties). One possible approach would be to group new (candidate) varieties and reference varieties using descriptions stored in databases prior to further of official test. testing. This study was carried out to manage a reference variety collection by databasing of hordein profiling. For this purpose, hordein subunits of the 48 National list barley (Hordeum vulgare L) cultivars were analysed. Total 22 of clear scorable hordein subunits were identified from D-subunit to B-subunit region and fifteen different hordein polypeptide patterns were obtained. Based on hordein subunit band pattern, UPGMA cluster analysis was conducted. Forty-eight cultivars were separated into three groups and genetic distance of cluster ranging from 0.55 to 1.00. Hordein subunits have a potential of selecting similar varieties compared with candidate varieties by controlling reference variety collection and playing an important complemental role in cultivar distinctness.
PDF KSCI

Gene Expression Profiling of Eukaryotic Microalga, Haematococcus pluvialis

EOM HYUNSUK;PARK SEUNGHYE;LEE CHOUL-GYUN;JIN EONSEON
- Journal of Microbiology and Biotechnology
- /
- v.15 no.5
- /
- pp.1060-1066
- /
- 2005
Under environmental stress, such as strong irradiance or nitrogen deficiency, unicellular green algae of the genus Haematococcus accumulate secondary carotenoids, i.e. astaxanthin, in the cytosol. The induction and regulation of astaxanthin biosynthesis in microalgae has recently received considerable attention owing to the increasing use of secondary carotenoids as a source of pigmentation for fish aquacultures, and as a potential drug in cancer prevention as a free-radical quencher. Accordingly, this study generated expressed sequence tags (ESTs) from a library constructed from astaxanthin-induced Haematococcus pluvialis. Partial sequences were obtained from the 5' ends of 1,858 individual cDNAs, and then grouped into 1,025 non-overlapping sequences, among which 708 sequences were singletons, while the remainder fell into 317 clusters. Approximately $63\%$ of the EST sequences showed similarity to previously described sequences in public databases. H. pluvialis was found to consist of a relatively high percentage of genes involved in genetic information processing ($15\%$) and metabolism ($11\%$), whereas a relatively low percentage of sequences was involved in the signal transduction ($3\%$), structure ($2\%$), and environmental information process ($3\%$). In addition, a relatively large fraction of H. pluvialis sequences was classified as genes involved in photosynthesis ($9\%$) and cellular process ($9\%$). Based on this EST analysis, the full-length cDNA sequence for superoxide dismutase (SOD) of H. pluvialis was cloned, and the expression of this gene was investigated. The abundance of SOD changed substantially in response to different culture conditions, indicating the possible regulation of this gene in H. pluvialis.
PDF KSCI

Thoroughbred Horse Single Nucleotide Polymorphism and Expression Database: HSDB

Lee, Joon-Ho;Lee, Taeheon;Lee, Hak-Kyo;Cho, Byung-Wook;Shin, Dong-Hyun;Do, Kyoung-Tag;Sung, Samsun;Kwak, Woori;Kim, Hyeon Jeong;Kim, Heebal;Cho, Seoae;Park, Kyung-Do
- Asian-Australasian Journal of Animal Sciences
- /
- v.27 no.9
- /
- pp.1236-1243
- /
- 2014
Genetics is important for breeding and selection of horses but there is a lack of well-established horse-related browsers or databases. In order to better understand horses, more variants and other integrated information are needed. Thus, we construct a horse genomic variants database including expression and other information. Horse Single Nucleotide Polymorphism and Expression Database (HSDB) (http://snugenome2.snu.ac.kr/HSDB) provides the number of unexplored genomic variants still remaining to be identified in the horse genome including rare variants by using population genome sequences of eighteen horses and RNA-seq of four horses. The identified single nucleotide polymorphisms (SNPs) were confirmed by comparing them with SNP chip data and variants of RNA-seq, which showed a concordance level of 99.02% and 96.6%, respectively. Moreover, the database provides the genomic variants with their corresponding transcriptional profiles from the same individuals to help understand the functional aspects of these variants. The database will contribute to genetic improvement and breeding strategies of Thoroughbreds.
https://doi.org/10.5713/ajas.2013.13694 인용 PDF KSCI

Association of the PSCA rs2294008 C＞T Polymorphism with Gastric Cancer Risk: Evidence from a Meta-Analysis

Zhang, Qing-Hui;Yao, Yong-Liang;Gu, Tao;Gu, Jin-Hua;Chen, Ling;Liu, Yun
- Asian Pacific Journal of Cancer Prevention
- /
- v.13 no.6
- /
- pp.2867-2871
- /
- 2012
Background: Multiple studies have reported associations between the PSCA rs2294008 C > T polymorphism and GC, but susceptibility has proven inconsistent. Therefore, we estimates the relationship between the rs2294008 C > T polymorphism and GC by meta-analysis. Methods: PubMed, Embase and Web of Science databases were searched and nine independent case-control studies were included in this meta-analysis. Crude ORs with 95% CIs were extracted according to the Mantal-Haenszel method and pooled to assess the strength of the association. Results: We observed that the PSCA rs2294008 C > T polymorphism was significantly correlated with GC risk when all studies were pooled into the meta-analysis. Further subgroup analysis showed the polymorphism to be linked with diffuse and noncardia GC in the allele contrast model, homozygote codominant model, dominant model, and recessive model. However, no connection was apparent for intestinal and cardia GC. In the stratified analysis by ethnicity, significant associations were observed in Asians for the recessive model. Interestingly, the relationship was particularly significant in the Chinese population. Conclusions: Our findings suggest that the PSCA rs2294008 C > T polymorphism is a risk factor for GC, especially in diffuse and noncardia GC and in Chinese.
https://doi.org/10.7314/APJCP.2012.13.6.2867 인용 PDF KSCI

Search Result 172, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)