• Title/Summary/Keyword: t-Nearest Neighbor

Search Result 46, Processing Time 0.023 seconds

Identification of Differentially Expressed Genes Using Tests Based on Multiple Imputations

  • Kim, Sang Cheol;Yu, Donghyeon
    • Quantitative Bio-Science
    • /
    • v.36 no.1
    • /
    • pp.23-31
    • /
    • 2017
  • Datasets from DNA microarray experiments, which are in the form of large matrices of expression levels of genes, often have missing values. However, the existing statistical methods including the principle components analysis (PCA) and Hotelling's t-test are not directly applicable for the datasets having missing values due to the fact that they assume the observed dataset is complete in general. Many methods have been proposed in previous literature to impute the missing in the observed data. Troyanskaya et al. [1] study the k-nearest neighbor (kNN) imputation, Kim et al. [2] propose the local least squares (LLS) method and Rubin [3] propose the multiple imputation (MI) for missing values. To identify differentially expressed genes, we propose a new testing procedure when the missing exists in the observed data. The proposed procedure uses the Stouffer's z-scores and combines the test results of individual imputed samples, which are dependent to each other. We numerically show that the proposed test procedure based on MI performs better than the existing test procedures based on single imputation (SI) by comparing their ROC curves. We apply the proposed method to analyzing a public microarray data.

Automatic Detection and Characterization of Cracked Constituent Particles/Inclusions in Al-Alloys under Uniaxial Tensile Loading (인장하중에 의한 Al 합금내 크랙형성 복합상의 자동검출 및 정량분석)

  • Lee, Soon Gi;Jang, Sung Ho;Kim, Yong Chan
    • Korean Journal of Metals and Materials
    • /
    • v.47 no.1
    • /
    • pp.7-12
    • /
    • 2009
  • The detailed quantitative microstructural data on the cracking of coarse constituent particles in 7075 (T651) series wrought Al-alloys have been studied using the utility of a novel digital image processing technique, where the particle cracks are generated due to monotonic loading. The microstructural parameters such as number density, volume fraction, size distribution, first nearest neighbor distribution, and two-point correlation function have been quantitatively characterized using the developed technique and such data are very useful to verify and study the theoretical models for the damage evolution and fracture of Al-alloys. The data suggests useful relationships for damage modeling such as a linear relationship between particle cracking and strain exists for the uniaxial tensile loading condition, where the larger particles crack preferentially.

Efficient Multi-scalable Network for Single Image Super Resolution

  • Alao, Honnang;Kim, Jin-Sung;Kim, Tae Sung;Lee, Kyujoong
    • Journal of Multimedia Information System
    • /
    • v.8 no.2
    • /
    • pp.101-110
    • /
    • 2021
  • In computer vision, single-image super resolution has been an area of research for a significant period. Traditional techniques involve interpolation-based methods such as Nearest-neighbor, Bilinear, and Bicubic for image restoration. Although implementations of convolutional neural networks have provided outstanding results in recent years, efficiency and single model multi-scalability have been its challenges. Furthermore, previous works haven't placed enough emphasis on real-number scalability. Interpolation-based techniques, however, have no limit in terms of scalability as they are able to upscale images to any desired size. In this paper, we propose a convolutional neural network possessing the advantages of the interpolation-based techniques, which is also efficient, deeming it suitable in practical implementations. It consists of convolutional layers applied on the low-resolution space, post-up-sampling along the end hidden layers, and additional layers on high-resolution space. Up-sampling is applied on a multiple channeled feature map via bicubic interpolation using a single model. Experiments on architectural structure, layer reduction, and real-number scale training are executed with results proving efficient amongst multi-scale learning (including scale multi-path-learning) based models.

Automated detection of panic disorder based on multimodal physiological signals using machine learning

  • Eun Hye Jang;Kwan Woo Choi;Ah Young Kim;Han Young Yu;Hong Jin Jeon;Sangwon Byun
    • ETRI Journal
    • /
    • v.45 no.1
    • /
    • pp.105-118
    • /
    • 2023
  • We tested the feasibility of automated discrimination of patients with panic disorder (PD) from healthy controls (HCs) based on multimodal physiological responses using machine learning. Electrocardiogram (ECG), electrodermal activity (EDA), respiration (RESP), and peripheral temperature (PT) of the participants were measured during three experimental phases: rest, stress, and recovery. Eleven physiological features were extracted from each phase and used as input data. Logistic regression (LoR), k-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and multilayer perceptron (MLP) algorithms were implemented with nested cross-validation. Linear regression analysis showed that ECG and PT features obtained in the stress and recovery phases were significant predictors of PD. We achieved the highest accuracy (75.61%) with MLP using all 33 features. With the exception of MLP, applying the significant predictors led to a higher accuracy than using 24 ECG features. These results suggest that combining multimodal physiological signals measured during various states of autonomic arousal has the potential to differentiate patients with PD from HCs.

Evaluation of the Optimum Interpolation for Creating Hydraulic Model from Close Range Digital Photogrammetry (근접수치사진측량으로 수리모형해석에 적용 시 최적보간법 평가)

  • Choi Hyun
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.23 no.3
    • /
    • pp.251-260
    • /
    • 2005
  • The Development of CCD has contributed to great advancement in mapping technology with giving benefits to research community of photogrammetry. The purpose of this paper is to find the best selection of interpolation method for creating a terrain model form close range digital photogrammetry. T-test as a kind of statistical analysis was conducted to analyze the similarity of hydraulic model with close range digital photogrammetry and trigonometric leveling. Also, many interpolation methods such as inverse distance, kriging, nearest neighbor and TIN about the hydraulic model interpolation were conducted to compare the results for computer to display actual terrain an optimum interpolation of the digital elevation model form close range digital photogrammetry. The results revealed that kriging and TIN interpolation were efficient methods to judge the hazard interpolation law by analyzing geometric similarity of hydraulic model against hydraulic model application.

Machine learning in survival analysis (생존분석에서의 기계학습)

  • Baik, Jaiwook
    • Industry Promotion Research
    • /
    • v.7 no.1
    • /
    • pp.1-8
    • /
    • 2022
  • We investigated various types of machine learning methods that can be applied to censored data. Exploratory data analysis reveals the distribution of each feature, relationships among features. Next, classification problem has been set up where the dependent variable is death_event while the rest of the features are independent variables. After applying various machine learning methods to the data, it has been found that just like many other reports from the artificial intelligence arena random forest performs better than logistic regression. But recently well performed artificial neural network and gradient boost do not perform as expected due to the lack of data. Finally Kaplan-Meier and Cox proportional hazard model have been employed to explore the relationship of the dependent variable (ti, δi) with the independent variables. Also random forest which is used in machine learning has been applied to the survival analysis with censored data.

Effect of missing values in detecting differentially expressed genes in a cDNA microarray experiment

  • Kim, Byung-Soo;Rha, Sun-Young
    • Bioinformatics and Biosystems
    • /
    • v.1 no.1
    • /
    • pp.67-72
    • /
    • 2006
  • The aim of this paper is to discuss the effect of missing values in detecting differentially expressed genes in a cDNA microarray experiment in the context of a one sample problem. We conducted a cDNA micro array experiment to detect differentially expressed genes for the metastasis of colorectal cancer based on twenty patients who underwent liver resection due to liver metastasis from colorectal cancer. Total RNAs from metastatic liver tumor and adjacent normal liver tissue from a single patient were labeled with cy5 and cy3, respectively, and competitively hybridized to a cDNA microarray with 7775 human genes. We used $M=log_2(R/G)$ for the signal evaluation, where Rand G denoted the fluorescent intensities of Cy5 and Cy3 dyes, respectively. The statistical problem comprises a one sample test of testing E(M)=0 for each gene and involves multiple tests. The twenty cDNA microarray data would comprise a matrix of dimension 7775 by 20, if there were no missing values. However, missing values occur for various reasons. For each gene, the no missing proportion (NMP) was defined to be the proportion of non-missing values out of twenty. In detecting differentially expressed (DE) genes, we used the genes whose NMP is greater than or equal to 0.4 and then sequentially increased NMP by 0.1 for investigating its effect on the detection of DE genes. For each fixed NMP, we imputed the missing values with K-nearest neighbor method (K=10) and applied the nonparametric t-test of Dudoit et al. (2002), SAM by Tusher et al. (2001) and empirical Bayes procedure by $L\ddot{o}nnstedt$ and Speed (2002) to find out the effect of missing values in the final outcome. These three procedures yielded substantially agreeable result in detecting DE genes. Of these three procedures we used SAM for exploring the acceptable NMP level. The result showed that the optimum no missing proportion (NMP) found in this data set turned out to be 80%. It is more desirable to find the optimum level of NMP for each data set by applying the method described in this note, when the plot of (NMP, Number of overlapping genes) shows a turning point.

  • PDF

Enhanced Image Magnification Using Edge Information (에지정보를 이용한 개선된 영상확대기법)

  • Je, Sung-Kwan;Cho, Jae-Hyun;Cha, Eui-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.12
    • /
    • pp.2343-2348
    • /
    • 2006
  • Image magnification is among the basic image processing operations. The most commonly used technique for image magnification are based on interpolation method(such as nearest neighbor, bilinear and cubic interpolation). However, the magnified images produced by the techniques that often appear a variety of undesirable image artifacts such as 'blocking' and 'blurring' or too takes the processing time into the several processing for image magnification. In this paper, we propose image magnification method which uses input image's sub-band information such as edge information to enhance the image magnification method. We use the whole image and not use the one's neighborhood pixels to detect the edge information of the image that isn't occurred the blocking phenomenon. And then we emphasized edge information to remove the blurring phenomenon which incited of edge information. Our method, which improves the performance of the traditional image magnification methods in the processing time, is presented. Experiment results show that the proposed method solves the drawbacks of the image magnification such as blocking and blurring phenomenon, and has a higher PSNR and Correlation than the traditional methods.

Design and Implementation of a Trajectory-based Index Structure for Moving Objects on a Spatial Network (공간 네트워크상의 이동객체를 위한 궤적기반 색인구조의 설계 및 구현)

  • Um, Jung-Ho;Chang, Jae-Woo
    • Journal of KIISE:Databases
    • /
    • v.35 no.2
    • /
    • pp.169-181
    • /
    • 2008
  • Because moving objects usually move on spatial networks, efficient trajectory index structures are required to achieve good retrieval performance on their trajectories. However, there has been little research on trajectory index structures for spatial networks such as FNR-tree and MON-tree. But, because FNR-tree and MON-tree are stored by the unit of the moving object's segment, they can't support the whole moving objects' trajectory. In this paper, we propose an efficient trajectory index structure, named Trajectory of Moving objects on Network Tree(TMN Tree), for moving objects. For this, we divide moving object data into spatial and temporal attribute, and preserve moving objects' trajectory. Then, we design index structure which supports not only range query but trajectory query. In addition, we divide user queries into spatio-temporal area based trajectory query, similar-trajectory query, and k-nearest neighbor query. We propose query processing algorithms to support them. Finally, we show that our trajectory index structure outperforms existing tree structures like FNR-Tree and MON-Tree.

Optimal supervised LSA method using selective feature dimension reduction (선택적 자질 차원 축소를 이용한 최적의 지도적 LSA 방법)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.1
    • /
    • pp.47-60
    • /
    • 2010
  • Most of the researches about classification usually have used kNN(k-Nearest Neighbor), SVM(Support Vector Machine), which are known as learn-based model, and Bayesian classifier, NNA(Neural Network Algorithm), which are known as statistics-based methods. However, there are some limitations of space and time when classifying so many web pages in recent internet. Moreover, most studies of classification are using uni-gram feature representation which is not good to represent real meaning of words. In case of Korean web page classification, there are some problems because of korean words property that the words have multiple meanings(polysemy). For these reasons, LSA(Latent Semantic Analysis) is proposed to classify well in these environment(large data set and words' polysemy). LSA uses SVD(Singular Value Decomposition) which decomposes the original term-document matrix to three different matrices and reduces their dimension. From this SVD's work, it is possible to create new low-level semantic space for representing vectors, which can make classification efficient and analyze latent meaning of words or document(or web pages). Although LSA is good at classification, it has some drawbacks in classification. As SVD reduces dimensions of matrix and creates new semantic space, it doesn't consider which dimensions discriminate vectors well but it does consider which dimensions represent vectors well. It is a reason why LSA doesn't improve performance of classification as expectation. In this paper, we propose new LSA which selects optimal dimensions to discriminate and represent vectors well as minimizing drawbacks and improving performance. This method that we propose shows better and more stable performance than other LSAs' in low-dimension space. In addition, we derive more improvement in classification as creating and selecting features by reducing stopwords and weighting specific values to them statistically.

  • PDF