• Title/Summary/Keyword: Information Enrichment

Search Result 127, Processing Time 0.029 seconds

A Study on a Statistical Matching Method Using Clustering for Data Enrichment

  • Kim Soon Y.;Lee Ki H.;Chung Sung S.
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.2
    • /
    • pp.509-520
    • /
    • 2005
  • Data fusion is defined as the process of combining data and information from different sources for the effectiveness of the usage of useful information contents. In this paper, we propose a data fusion algorithm using k-means clustering method for data enrichment to improve data quality in knowledge discovery in database(KDD) process. An empirical study was conducted to compare the proposed data fusion technique with the existing techniques and shows that the newly proposed clustering data fusion technique has low MSE in continuous fusion variables.

Identifying Statistically Significant Gene-Sets by Gene Set Enrichment Analysis Using Fisher Criterion (Fisher Criterion을 이용한 Gene Set Enrichment Analysis 기반 유의 유전자 집합의 검출 방법 연구)

  • Kim, Jae-Young;Shin, Mi-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.45 no.4
    • /
    • pp.19-26
    • /
    • 2008
  • Gene set enrichment analysis (GSEA) is a computational method to identify statistically significant gene sets showing significant differences between two groups of microarray expression profiles and simultaneously uncover their biological meanings in an elegant way by employing gene annotation databases, such as Cytogenetic Band, KEGG pathways, gene ontology, and etc. For the gone set enrichment analysis, all the genes in a given dataset are first ordered by the signal-to-noise ratio between the groups and then further analyses are proceeded. Despite of its impressive results in several previous studies, however, gene ranking by the signal-to-noise ratio makes it difficult to consider highly up-regulated genes and highly down-regulated genes at the same time as the candidates of significant genes, which possibly reflect certain situations incurred in metabolic and signaling pathways. To deal with this problem, in this article, we investigate the gene set enrichment analysis method with Fisher criterion for gene ranking and also evaluate its effects in Leukemia related pathway analyses.

Spam Mail Filtering System using Ontology and Semantic Enrichment (온톨로지와 Semantic Enrichment를 이용한 스팸 메일 필터링 시스템)

  • 김현준;김흥남;정재은;조근식
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.553-555
    • /
    • 2004
  • 최근 인터넷의 급속한 성장과 더불어 전자메일(I-Mail)은 의사교환의 필수적인 매체로 사용 되어지고 있다. 그러나 편리하고 비용이 들지 앉는 장정을 이용해 엄청난 양의 스맴 메일이 매일같이 솎아져 오고, 이를 해결하기 위한 다양한 연구들이 제시되어져 왔다. 특히. 문서 분류에 널리 쓰이는 베이지안 분류자(Bayesian classifier)가 가장 널리 이용되어지고 있는데, 정확도와 재현율에서 비교적 우수한 성능을 보이고 있다. 그러나 몇 가지 문제점을 갖고 있는데, 첫째, 사전에 사용자에 의해 스팸. 논스팸 메일에 대한 충분한 학습이 선행되어야 하는 정, 둘째, 필터링을 위한 연산시간이 소요되는 점, 셋째, 필터링의 대상이 되는 메일 본문의 내용이 적을 경우 정확한 필터링이 어렵다는 정 등의 문제점이 있다. 본 논문에서는 마지막 문제점으로 지적된 메일 본문의 내용이 적을 경우 즉, 연산을 위한 특징적인 단어들의 부족으로 정확한 분류가 불가능한 경우의 해결방안으로 온틀로지와 Semantic Enrichment 기법을 이용한 스팸 메일 필터링 시스템을 제안한다. 실험 결과, 제안하는 시스템이 베이지안 분류자를 이용한 분류 시스템보다 정확도에서 4.1%, 재현율에서 10.5%. 그리고 F-measure에서 7.64%의 성능향상을 보였다.

  • PDF

Uranium Enrichment Comparison of UO2 Pellet with Alpha Spectrometry and TIMS

  • Song, Ji-Yeon;Seo, Hana;Kim, Sung-Hwan;Choi, Jung-Youn
    • Journal of Radiation Protection and Research
    • /
    • v.43 no.3
    • /
    • pp.120-123
    • /
    • 2018
  • Background: Analysis of enrichment of $UO_2$ is important to verify the information declared by the license-holders. The redundancy methods are required to guarantee the analysis result. Korea Institute of Nuclear Nonproliferation and Control (KINAC) used to analyze it with alpha spectrometry and consign to Korea Basic Science Institute (KBSI) Thermal Ionization Mass Spectrometry (TIMS). This article evaluated the similarity of the results with two methods and derive correlation equation. It could be compared to the results measured by TIMS running by KBSI. Materials and Methods: There are not many certified materials for the uranium enrichment value. Therefore, 34 uranium pellets, which have the wide range of uranium enrichment from 0.21 to 4.69 wt%, were used for the experiments by the alpha spectrometry and the TIMS. Results and Discussion: The study shows there are the tendency of analyzed enrichment by each equipment. It shows uranium enrichment with alpha spectrometry evaluated 17% higher than that with TIMS on average. The regression equations were also derived in case the similarity between the two results with two methods is lower than predicted. Two experiments were designed to compare the effect of number of samples. The $R^2$ was 0.9977 with 34 pellets. It shows the equation is appropriate to predict the enrichment values by TIMS with that of alpha spectrometry. The $R^2$ was 0.9858 with four pellets for ten times. The $R^2$ decreased while the number of samples increased. The discrepancy between the lowest and highest enrichment seems to be one of the reason for it. Conclusion: KINAC expects the first equation with 34 samples is useful to predict the result with TIMS, the redundancy method, based on the alpha spectrometry. The extra samples are necessary to collect if the enrichment value analyzed by TIMS is lower than the value predicted with the equation. Further study would be followed related to the impact of the peak counts for each uranium isotopes, sample amount and number of experiments when TIMS established in KINAC by the end of 2018.

Incremental Enrichment of Ontologies through Feature-based Pattern Variations (자질별 관계 패턴의 다변화를 통한 온톨로지 확장)

  • Lee, Sheen-Mok;Chang, Du-Seong;Shin, Ji-Ae
    • The KIPS Transactions:PartB
    • /
    • v.15B no.4
    • /
    • pp.365-374
    • /
    • 2008
  • In this paper, we propose a model to enrich an ontology by incrementally extending the relations through variations of patterns. In order to generalize initial patterns, combinations of features are considered as candidate patterns. The candidate patterns are used to extract relations from Wikipedia, which are sorted out according to reliability based on corpus frequency. Selected patterns then are used to extract relations, while extracted relations are again used to extend the patterns of the relation. Through making variations of patterns in incremental enrichment process, the range of pattern selection is broaden and refined, which can increase coverage and accuracy of relations extracted. In the experiments with single-feature based pattern models, we observe that the features of lexical, headword, and hypernym provide reliable information, while POS and syntactic features provide general information that is useful for enrichment of relations. Based on observations on the feature types that are appropriate for each syntactic unit type, we propose a pattern model based on the composition of features as our ongoing work.

GraPT: Genomic InteRpreter about Predictive Toxicology

  • Woo Jung-Hoon;Park Yu-Rang;Jung Yong;Kim Ji-Hun;Kim Ju-Han
    • Genomics & Informatics
    • /
    • v.4 no.3
    • /
    • pp.129-132
    • /
    • 2006
  • Toxicogenomics has recently emerged in the field of toxicology and the DNA microarray technique has become common strategy for predictive toxicology which studies molecular mechanism caused by exposure of chemical or environmental stress. Although microarray experiment offers extensive genomic information to the researchers, yet high dimensional characteristic of the data often makes it hard to extract meaningful result. Therefore we developed toxicant enrichment analysis similar to the common enrichment approach. We also developed web-based system graPT to enable considerable prediction of toxic endpoints of experimental chemical.

A Study on the Data Fusion Method using Decision Rule for Data Enrichment (의사결정 규칙을 이용한 데이터 통합에 관한 연구)

  • Kim S.Y.;Chung S.S.
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.2
    • /
    • pp.291-303
    • /
    • 2006
  • Data mining is the work to extract information from existing data file. So, the one of best important thing in data mining process is the quality of data to be used. In this thesis, we propose the data fusion technique using decision rule for data enrichment that one phase to improve data quality in KDD process. Simulations were performed to compare the proposed data fusion technique with the existing techniques. As a result, our data fusion technique using decision rule is characterized with low MSE or misclassification rate in fusion variables.

Experimental and Numerical Investigations on Detailed Methane Reaction Mechanisms in Oxygen Enriched Conditions (산소부화조건의 메탄 상세반응기구에 대한 실험 및 수치해석 연구)

  • Han, Ji-Woong;Lee, Chang-Eon
    • Transactions of the Korean Society of Mechanical Engineers B
    • /
    • v.28 no.2
    • /
    • pp.207-214
    • /
    • 2004
  • The burning velocities of conventional and oxygen-enriched methane flame in various equivalence ratio were determined by experiments. The validity of existing reaction mechanisms was examined in oxygen-enriched flame on the basis of the experiment results. Modified reaction mechanism is suggested, which was able to predict burning velocity of oxygen enriched flame as well as methane-air flame. Complementary study on reaction mechanisms shows the following results : Present experiment data were found to be more reliable in comparison with existing ones in a oxygen-enrichment condition. It was found that some modification in existing reaction mechanisms is necessary, since discrepancy between measurements and predictions is increasing with oxygen enrichment ratio. The sensitivity analysis was performed to discriminate the dominantly affecting reactions on the burning velocity in various oxygen enrichment and equivalence ratio. A modified GRI 3.0 reaction mechanism based on our experiment results was suggested, in which reaction rate coefficients of (R38) H+O$_2$<=>O+OH in GRI 3.0 reaction mechanisms were corrected based on sensitivity analysis results. This mechanism showed a good agreement in predicting the burning velocity and number density of NO in oxygen-enriched flame and would provide proper reaction information of oxygen-enriched flame at this stage.

A Study on the Data Fusion for Data Enrichment (데이터 보강을 위한 데이터 통합기법에 관한 연구)

  • 정성석;김순영;김현진
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.3
    • /
    • pp.605-617
    • /
    • 2004
  • One of the best important thing in data mining process is the quality of data used. When we perform the mining on data with excellent quality, the potential value of data mining can be improved. In this paper, we propose the data fusion technique for data enrichment that one phase can improve data quality in KDD process. We attempted to add k-NN technique to the regression technique, to improve performance of fusion technique through reduction of the loss of information. Simulations were performed to compare the proposed data fusion technique with the regression technique. As a result, the newly proposed data fusion technique is characterized with low MSE in continuous fusion variables.