• Title/Summary/Keyword: Data Normalization

Search Result 481, Processing Time 0.039 seconds

A Bayesian Validation Method for Classification of Microarray Expression Data (마이크로어레이 발현 데이터 분류를 위한 베이지안 검증 기법)

  • Park, Su-Young;Jung, Jong-Pil;Jung, Chai-Yeoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.11
    • /
    • pp.2039-2044
    • /
    • 2006
  • Since the bio-information now even exceeds the capability of human brain, the techniques of data mining and artificial intelligent are needed to deal with the information in this field. There are many researches about using DNA microarray technique which can obtain information from thousands of genes at once, for developing new methods of analyzing and predicting of diseases. Discovering the mechanisms of unknown genes by using these new method is expecting to develop the new drugs and new curing methods. In this Paper, We tested accuracy on classification of microarray in Bayesian method to compare normalization method's Performance after dividing data in two class that is a feature abstraction method through a normalization process which reduce or remove noise generating in microarray experiment by various factors. And We represented that it improve classification performance in 95.89% after Lowess normalization.

Comparison of Machine Learning-Based Radioisotope Identifiers for Plastic Scintillation Detector

  • Jeon, Byoungil;Kim, Jongyul;Yu, Yonggyun;Moon, Myungkook
    • Journal of Radiation Protection and Research
    • /
    • v.46 no.4
    • /
    • pp.204-212
    • /
    • 2021
  • Background: Identification of radioisotopes for plastic scintillation detectors is challenging because their spectra have poor energy resolutions and lack photo peaks. To overcome this weakness, many researchers have conducted radioisotope identification studies using machine learning algorithms; however, the effect of data normalization on radioisotope identification has not been addressed yet. Furthermore, studies on machine learning-based radioisotope identifiers for plastic scintillation detectors are limited. Materials and Methods: In this study, machine learning-based radioisotope identifiers were implemented, and their performances according to data normalization methods were compared. Eight classes of radioisotopes consisting of combinations of 22Na, 60Co, and 137Cs, and the background, were defined. The training set was generated by the random sampling technique based on probabilistic density functions acquired by experiments and simulations, and test set was acquired by experiments. Support vector machine (SVM), artificial neural network (ANN), and convolutional neural network (CNN) were implemented as radioisotope identifiers with six data normalization methods, and trained using the generated training set. Results and Discussion: The implemented identifiers were evaluated by test sets acquired by experiments with and without gain shifts to confirm the robustness of the identifiers against the gain shift effect. Among the three machine learning-based radioisotope identifiers, prediction accuracy followed the order SVM > ANN > CNN, while the training time followed the order SVM > ANN > CNN. Conclusion: The prediction accuracy for the combined test sets was highest with the SVM. The CNN exhibited a minimum variation in prediction accuracy for each class, even though it had the lowest prediction accuracy for the combined test sets among three identifiers. The SVM exhibited the highest prediction accuracy for the combined test sets, and its training time was the shortest among three identifiers.

Study on Data Normalization and Representation for Quantitative Analysis of EEG Signals (뇌파 신호의 정량적 분석을 위한 데이터 정규화 및 표현기법 연구)

  • Hwang, Taehun;Kim, Jin Heon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.9 no.6
    • /
    • pp.729-738
    • /
    • 2019
  • Recently, we aim to improve the quality of virtual reality contents based on quantitative analysis results of emotions through combination of emotional recognition field and virtual reality field. Emotions are analyzed based on the participant's vital signs. Much research has been done in terms of signal analysis, but the methodology for quantifying emotions has not been fully discussed. In this paper, we propose a normalization function design and expression method to quantify the emotion between various bio - signals. Use the Brute force algorithm to find the optimal parameters of the normalization function and improve the confidence score of the parameters found using the true and false scores defined in this paper. As a result, it is possible to automate the parameter determination of the bio-signal normalization function depending on the experience, and the emotion can be analyzed quantitatively based on this.

Theoretical Investigation of Metal Artifact Reduction Based on Sinogram Normalization in Computed Tomography (컴퓨터 단층영상에서 사이노그램 정규화를 이용한 금속 영상왜곡 저감 방법의 이론적 고찰)

  • Jeon, Hosang;Youn, Hanbean;Nam, Jiho;Kim, Ho Kyung
    • Progress in Medical Physics
    • /
    • v.24 no.4
    • /
    • pp.303-314
    • /
    • 2013
  • Image quality of computed tomography (CT) is very vulnerable to metal artifacts. Recently, the thickness and background normalization techniques have been introduced. Since they provide flat sinograms, it is easy to determine metal traces and a simple linear interpolation would be enough to describe the missing data in sinograms. In this study, we have developed a theory describing two normalization methods and compared two methods with respect to various sizes and numbers of metal inserts by using simple numerical simulations. The developed theory showed that the background normalization provide flatter sinograms than the thickness normalization, which was validated with the simulation results. Numerical simulation results with respect to various sizes and numbers of metal inserts showed that the background normalization was better than the thickness normalization for metal artifact corrections. Although the residual artifacts still existed, we have showed that the background normalization without the segmentation procedure was better than the thickness normalization for metal artifact corrections. Since the background normalization without the segmentation procedure is simple and it does not require any users' intervention, it can be readily installed in conventional CT systems.

Negative Side Effects of Denormalization-Oriented Data Modeling in Enterprise-Wide Database Design (기업 전사 자료 설계에서 역정규화 중심 데이터 모델링의 부작용)

  • Rhee, Hae-Kyung
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.6 s.312
    • /
    • pp.17-25
    • /
    • 2006
  • As information systems to be computerized get significantly scaled up, data modeling issues apparently considered to be crucial once again as the early 1980's under the terms of data governance, data architecture or data quality. Unfortuately, merely resorting to heuristics-based field approaches with more or less no firm theoretical foundation of knowledge with regard to criteria of data design lead quite often to major failures in efficacy of data modeling. In this paper, we have compared normalization-critical data modeling approach, well-known as the Non-Stop Data Modeling methodology in the literature, to the Information Engineering in which in many occasions the notion of do-normalization is supported and even recommended as a mandatory part in its modeling nature. Quantitative analyses have revealed that NS methodology ostensibly outperforms IE methodology in terms of efficiency indices like adequacy of entity judgement, degree of existence of data circulation path that confirms the balancedness of data design and ratio of unnecessary data attribute replication.

Data Cleaning and Integration of Multi-year Dietary Survey in the Korea National Health and Nutrition Examination Survey (KNHANES) using Database Normalization Theory (데이터베이스 정규화 이론을 이용한 국민건강영양조사 중 다년도 식이조사 자료 정제 및 통합)

  • Kwon, Namji;Suh, Jihye;Lee, Hunjoo
    • Journal of Environmental Health Sciences
    • /
    • v.43 no.4
    • /
    • pp.298-306
    • /
    • 2017
  • Objectives: Since 1998, the Korea National Health and Nutrition Examination Survey (KNHANES) has been conducted in order to investigate the health and nutritional status of Koreans. The food intake data of individuals in the KNHANES has also been utilized as source dataset for risk assessment of chemicals via food. To improve the reliability of intake estimation and prevent missing data for less-responded foods, the structure of integrated long-standing datasets is significant. However, it is difficult to merge multi-year survey datasets due to ineffective cleaning processes for handling extensive numbers of codes for each food item along with changes in dietary habits over time. Therefore, this study aims at 1) cleaning the process of abnormal data 2) generation of integrated long-standing raw data, and 3) contributing to the production of consistent dietary exposure factors. Methods: Codebooks, the guideline book, and raw intake data from KNHANES V and VI were used for analysis. The violation of the primary key constraint and the $1^{st}-3rd$ normal form in relational database theory were tested for the codebook and the structure of the raw data, respectively. Afterwards, the cleaning process was executed for the raw data by using these integrated codes. Results: Duplication of key records and abnormality in table structures were observed. However, after adjusting according to the suggested method above, the codes were corrected and integrated codes were newly created. Finally, we were able to clean the raw data provided by respondents to the KNHANES survey. Conclusion: The results of this study will contribute to the integration of the multi-year datasets and help improve the data production system by clarifying, testing, and verifying the primary key, integrity of the code, and primitive data structure according to the database normalization theory in the national health data.

An Improved Image Classification Using Batch Normalization and CNN (배치 정규화와 CNN을 이용한 개선된 영상분류 방법)

  • Ji, Myunggeun;Chun, Junchul;Kim, Namgi
    • Journal of Internet Computing and Services
    • /
    • v.19 no.3
    • /
    • pp.35-42
    • /
    • 2018
  • Deep learning is known as a method of high accuracy among several methods for image classification. In this paper, we propose a method of enhancing the accuracy of image classification using CNN with a batch normalization method for classification of images using deep CNN (Convolutional Neural Network). In this paper, we propose a method to add a batch normalization layer to existing neural networks to enhance the accuracy of image classification. Batch normalization is a method to calculate and move the average and variance of each batch for reducing the deflection in each layer. In order to prove the superiority of the proposed method, Accuracy and mAP are measured by image classification experiments using five image data sets SHREC13, MNIST, SVHN, CIFAR-10, and CIFAR-100. Experimental results showed that the CNN with batch normalization is better classification accuracy and mAP rather than using the conventional CNN.

Performance Improvements for Silence Feature Normalization Method by Using Filter Bank Energy Subtraction (필터 뱅크 에너지 차감을 이용한 묵음 특징 정규화 방법의 성능 향상)

  • Shen, Guanghu;Choi, Sook-Nam;Chung, Hyun-Yeol
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.35 no.7C
    • /
    • pp.604-610
    • /
    • 2010
  • In this paper we proposed FSFN (Filter bank sub-band energy subtraction based CLSFN) method to improve the recognition performance of the existing CLSFN (Cepstral distance and Log-energy based Silence Feature Normalization). The proposed FSFN reduces the energy of noise components in filter bank sub-band domain when extracting the features from speech data. This leads to extract the enhanced cepstral features and thus improves the accuracy of speech/silence classification using the enhanced cepstral features. Therefore, it can be expected to get improved performance comparing with the existing CLSFN. Experimental results conducted on Aurora 2.0 DB showed that our proposed FSFN method improves the averaged word accuracy of 2% comparing with the conventional CLSFN method, and FSFN combined with CMVN (Cepstral Mean and Variance Normalization) also showed the best recognition performance comparing with others.

Effect of Normalization on Detection of Differentially-Expressed Genes with Moderate Effects

  • Cho, Seo-Ae;Lee, Eun-Jee;Kim, Young-Chul;Park, Tae-Sung
    • Genomics & Informatics
    • /
    • v.5 no.3
    • /
    • pp.118-123
    • /
    • 2007
  • The current existing literature offers little guidance on how to decide which method to use to analyze one-channel microarray measurements when dealing with large, grouped samples. Most previous methods have focused on two-channel data;therefore they can not be easily applied to one-channel microarray data. Thus, a more reliable method is required to determine an appropriate combination of individual basic processing steps for a given dataset in order to improve the validity of one-channel expression data analysis. We address key issues in evaluating the effectiveness of basic statistical processing steps of microarray data that can affect the final outcome of gene expression analysis without focusingon the intrinsic data underlying biological interpretation.

A Correction Approach to Bidirectional Effects of EO-1 Hyperion Data for Forest Classification

  • Park, Seung-Hwan;Kim, Choen
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.1470-1472
    • /
    • 2003
  • Hyperion, as hyperspectral data, is carried on NASA’s EO-1 satellite, can be used in more subtle discrimination on forest cover, with 224 band in 360 ?2580 nm (10nm interval). In this study, Hyperion image is used to investigate the effects of topography on the classification of forest cover, and to assess whether the topographic correction improves the discrimination of species units for practical forest mapping. A publicly available Digital Elevation Model (DEM), at a scale of 1:25,000, is used to model the radiance variation on forest, considering MSR(Mean Spectral Ratio) on antithesis aspects. Hyperion, as hyperspectral data, is corrected on a pixel-by-pixel basis to normalize the scene to a uniform solar illumination and viewing geometry. As a result, the approach on topographic effect normalization in hyperspectral data can effectively reduce the variation in detected radiance due to changes in forest illumination, progress the classification of forest cover.

  • PDF