• Title/Summary/Keyword: Dimensionality

Search Result 562, Processing Time 0.025 seconds

A Node2Vec-Based Gene Expression Image Representation Method for Effectively Predicting Cancer Prognosis (암 예후를 효과적으로 예측하기 위한 Node2Vec 기반의 유전자 발현량 이미지 표현기법)

  • Choi, Jonghwan;Park, Sanghyun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.10
    • /
    • pp.397-402
    • /
    • 2019
  • Accurately predicting cancer prognosis to provide appropriate treatment strategies for patients is one of the critical challenges in bioinformatics. Many researches have suggested machine learning models to predict patients' outcomes based on their gene expression data. Gene expression data is high-dimensional numerical data containing about 17,000 genes, so traditional researches used feature selection or dimensionality reduction approaches to elevate the performance of prognostic prediction models. These approaches, however, have an issue of making it difficult for the predictive models to grasp any biological interaction between the selected genes because feature selection and model training stages are performed independently. In this paper, we propose a novel two-dimensional image formatting approach for gene expression data to achieve feature selection and prognostic prediction effectively. Node2Vec is exploited to integrate biological interaction network and gene expression data and a convolutional neural network learns the integrated two-dimensional gene expression image data and predicts cancer prognosis. We evaluated our proposed model through double cross-validation and confirmed superior prognostic prediction accuracy to traditional machine learning models based on raw gene expression data. As our proposed approach is able to improve prediction models without loss of information caused by feature selection steps, we expect this will contribute to development of personalized medicine.

Line-Segment Feature Analysis Algorithm for Handwritten-Digits Data Reduction (필기체 숫자 데이터 차원 감소를 위한 선분 특징 분석 알고리즘)

  • Kim, Chang-Min;Lee, Woo-Beom
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.4
    • /
    • pp.125-132
    • /
    • 2021
  • As the layers of artificial neural network deepens, and the dimension of data used as an input increases, there is a problem of high arithmetic operation requiring a lot of arithmetic operation at a high speed in the learning and recognition of the neural network (NN). Thus, this study proposes a data dimensionality reduction method to reduce the dimension of the input data in the NN. The proposed Line-segment Feature Analysis (LFA) algorithm applies a gradient-based edge detection algorithm using median filters to analyze the line-segment features of the objects existing in an image. Concerning the extracted edge image, the eigenvalues corresponding to eight kinds of line-segment are calculated, using 3×3 or 5×5-sized detection filters consisting of the coefficient values, including [0, 1, 2, 4, 8, 16, 32, 64, and 128]. Two one-dimensional 256-sized data are produced, accumulating the same response values from the eigenvalue calculated with each detection filter, and the two data elements are added up. Two LFA256 data are merged to produce 512-sized LAF512 data. For the performance evaluation of the proposed LFA algorithm to reduce the data dimension for the recognition of handwritten numbers, as a result of a comparative experiment, using the PCA technique and AlexNet model, LFA256 and LFA512 showed a recognition performance respectively of 98.7% and 99%.

Comparative analysis of Machine-Learning Based Models for Metal Surface Defect Detection (머신러닝 기반 금속외관 결함 검출 비교 분석)

  • Lee, Se-Hun;Kang, Seong-Hwan;Shin, Yo-Seob;Choi, Oh-Kyu;Kim, Sijong;Kang, Jae-Mo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.6
    • /
    • pp.834-841
    • /
    • 2022
  • Recently, applying artificial intelligence technologies in various fields of production has drawn an upsurge of research interest due to the increase for smart factory and artificial intelligence technologies. A great deal of effort is being made to introduce artificial intelligence algorithms into the defect detection task. Particularly, detection of defects on the surface of metal has a higher level of research interest compared to other materials (wood, plastics, fibers, etc.). In this paper, we compare and analyze the speed and performance of defect classification by combining machine learning techniques (Support Vector Machine, Softmax Regression, Decision Tree) with dimensionality reduction algorithms (Principal Component Analysis, AutoEncoders) and two convolutional neural networks (proposed method, ResNet). To validate and compare the performance and speed of the algorithms, we have adopted two datasets ((i) public dataset, (ii) actual dataset), and on the basis of the results, the most efficient algorithm is determined.

Determination of Survival of Gastric Cancer Patients With Distant Lymph Node Metastasis Using Prealbumin Level and Prothrombin Time: Contour Plots Based on Random Survival Forest Algorithm on High-Dimensionality Clinical and Laboratory Datasets

  • Zhang, Cheng;Xie, Minmin;Zhang, Yi;Zhang, Xiaopeng;Feng, Chong;Wu, Zhijun;Feng, Ying;Yang, Yahui;Xu, Hui;Ma, Tai
    • Journal of Gastric Cancer
    • /
    • v.22 no.2
    • /
    • pp.120-134
    • /
    • 2022
  • Purpose: This study aimed to identify prognostic factors for patients with distant lymph node-involved gastric cancer (GC) using a machine learning algorithm, a method that offers considerable advantages and new prospects for high-dimensional biomedical data exploration. Materials and Methods: This study employed 79 features of clinical pathology, laboratory tests, and therapeutic details from 289 GC patients whose distant lymphadenopathy was presented as the first episode of recurrence or metastasis. Outcomes were measured as any-cause death events and survival months after distant lymph node metastasis. A prediction model was built based on possible outcome predictors using a random survival forest algorithm and confirmed by 5×5 nested cross-validation. The effects of single variables were interpreted using partial dependence plots. A contour plot was used to visually represent survival prediction based on 2 predictive features. Results: The median survival time of patients with GC with distant nodal metastasis was 9.2 months. The optimal model incorporated the prealbumin level and the prothrombin time (PT), and yielded a prediction error of 0.353. The inclusion of other variables resulted in poorer model performance. Patients with higher serum prealbumin levels or shorter PTs had a significantly better prognosis. The predicted one-year survival rate was stratified and illustrated as a contour plot based on the combined effect the prealbumin level and the PT. Conclusions: Machine learning is useful for identifying the important determinants of cancer survival using high-dimensional datasets. The prealbumin level and the PT on distant lymph node metastasis are the 2 most crucial factors in predicting the subsequent survival time of advanced GC.

Estimation of Spatial Distribution Using the Gaussian Mixture Model with Multivariate Geoscience Data (다변량 지구과학 데이터와 가우시안 혼합 모델을 이용한 공간 분포 추정)

  • Kim, Ho-Rim;Yu, Soonyoung;Yun, Seong-Taek;Kim, Kyoung-Ho;Lee, Goon-Taek;Lee, Jeong-Ho;Heo, Chul-Ho;Ryu, Dong-Woo
    • Economic and Environmental Geology
    • /
    • v.55 no.4
    • /
    • pp.353-366
    • /
    • 2022
  • Spatial estimation of geoscience data (geo-data) is challenging due to spatial heterogeneity, data scarcity, and high dimensionality. A novel spatial estimation method is needed to consider the characteristics of geo-data. In this study, we proposed the application of Gaussian Mixture Model (GMM) among machine learning algorithms with multivariate data for robust spatial predictions. The performance of the proposed approach was tested through soil chemical concentration data from a former smelting area. The concentrations of As and Pb determined by ex-situ ICP-AES were the primary variables to be interpolated, while the other metal concentrations by ICP-AES and all data determined by in-situ portable X-ray fluorescence (PXRF) were used as auxiliary variables in GMM and ordinary cokriging (OCK). Among the multidimensional auxiliary variables, important variables were selected using a variable selection method based on the random forest. The results of GMM with important multivariate auxiliary data decreased the root mean-squared error (RMSE) down to 0.11 for As and 0.33 for Pb and increased the correlations (r) up to 0.31 for As and 0.46 for Pb compared to those from ordinary kriging and OCK using univariate or bivariate data. The use of GMM improved the performance of spatial interpretation of anthropogenic metals in soil. The multivariate spatial approach can be applied to understand complex and heterogeneous geological and geochemical features.

Dimensionality of emotion suppression and psychosocial adaptation: Based on the cognitive process model of emotion processing (정서 처리의 인지 평가모델을 기반으로 한 정서 억제의 차원성과 심리 사회적 적응)

  • Woo, Sungbum
    • Korean Journal of Culture and Social Issue
    • /
    • v.27 no.4
    • /
    • pp.475-503
    • /
    • 2021
  • The purpose of this study is to clarify the constructs of emotion suppression and help understanding on the multidimensional nature of emotion suppression by classifying constructs for suppression according to the KMW model. Also, this study examined the gender differences of emotion suppression. For this purpose, 657 adult male and female subjects were evaluated for attitude toward emotions, and difficulty in emotional regulation, as well as depression, state anger and daily stress scale. As a result of the exploratory factor analysis on the scales related to the emotion suppression factors, the emotion suppression factors corresponding to each stage of the KMW model were found to be 'distraction against emotional information, 'difficulty in understanding and interpretation of emotions', 'emotion control beliefs', 'vulnerability on emotional expression beliefs'. Next, the study participants were classified by performing a cluster analysis based on each emotion suppression factor. As a result, four clusters were extracted and named 'emotional control belief cluster', 'emotional expression cluster', 'emotional attention failure cluster', and 'general emotional suppression cluster'. As a result of examining the average difference of male depression, depression, state anger, and daily stress for each group, significant differences were found in all dependent variables. As a result of examining whether there is a difference in the frequency of emotional suppression clusters according to gender, the frequency of emotional suppression clusters was high in men, and the ratio of emotional expression clusters was high in women. Finally, it was analyzed whether there was a gender difference in the effect of the emotional suppression cluster on psychosocial adaptation, and the implications were discussed based on the results of this study.

Primary School Spatial Characteristics and Architectural Design Methods based on Prospect and Refuge Concept (조망과 은신개념으로서의 초등학교 공간특성과 건축설계 방법연구)

  • Shim, Eun-Ju
    • The Journal of Sustainable Design and Educational Environment Research
    • /
    • v.22 no.1
    • /
    • pp.1-12
    • /
    • 2023
  • With the possibility of preventing crime through environmental design, CPTED guidelines have been introduced and applied to various places. However, although guidelines may be a useful design tool, there are also limitations to referencing it in the early conceptual phase of school architecture. Therefore, the purpose of this study is to examine the concept of "prospect and refuge", which serves as the basis of CPTED, and derives architectural characteristics and application methods based on the concept. For the case study, this research selected six small to medium-scale elementary schools with outstanding creative ideas built within the last 10 years. The results showed that the spatial characteristics of the "prospect" can be achieved by organizing the three-dimensionality of the space, vista prospect, and design attention on circulation areas. The concept of "refuge" was realized through the segmentation of the mass and spatial enclosure. Although the subjects had different social conditions and educational directions in Korea, this study may be used as a theoretical framework for designing a safe school environment.

Study on Dimension Reduction algorithm for unsupervised clustering of the DMR's RF-fingerprinting features (무선단말기 RF-fingerprinting 특징의 비지도 클러스터링을 위한 차원축소 알고리즘 연구)

  • Young-Giu Jung;Hak-Chul Shin;Sun-Phil Nah
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.3
    • /
    • pp.83-89
    • /
    • 2023
  • The clustering technique using RF fingerprint extracts the characteristic signature of the transmitters which are embedded in the transmission waveforms. The output of the RF-Fingerprint feature extraction algorithm for clustering identical DMR(Digital Mobile Radios) is a high-dimensional feature, typically consisting of 512 or more dimensions. While such high-dimensional features may be effective for the classifiers, they are not suitable to be used as inputs for the clustering algorithms. Therefore, this paper proposes a dimension reduction algorithm that effectively reduces the dimensionality of the multidimensional RF-Fingerprint features while maintaining the fingerprinting characteristics of the DMRs. Additionally, it proposes a clustering algorithm that can effectively cluster the reduced dimensions. The proposed clustering algorithm reduces the multi-dimensional RF-Fingerprint features using t-SNE, based on KL Divergence, and performs clustering using Density Peaks Clustering (DPC). The performance analysis of the DMR clustering algorithm uses a dataset of 3000 samples collected from 10 Motorola XiR and 10 Wintech N-Series DMRs. The results of the RF-Fingerprinting-based clustering algorithm showed the formation of 20 clusters, and all performance metrics including Homogeneity, Completeness, and V-measure, demonstrated a performance of 99.4%.

The Development and Validation of the Silence Motivation Scale (침묵동기 척도 개발 및 타당화)

  • Choi, Myoung Ok;Park Dong gun
    • Korean Journal of Culture and Social Issue
    • /
    • v.23 no.2
    • /
    • pp.239-270
    • /
    • 2017
  • This study investigated the nature and dimensionality of the motives why employees showed the silence even though they could speak up their opinions. It aimed to develop the scales measuring employee silence. Thus, three studies were designed and particularly, two studies featured two different studies, totaling five studies. Study 1 conducted open-ended survey asking and 104 workers from a variety of work field answered. With the results of open-ended questions, a were developed, consisting of 60-items to measure employee silence motivation. Study 2 examined the scale developed and 481 workers from diverse work fields participated in. The exploratory factor and 'intra-ESEM' analyses were confirmed the construct of silence motivation, composing 5 factors(acquiescent, defensive, disengaged, opportunistic, relational silence) the 20-items was developed to measure the construct(Study 2-1). Furthermore, 'inter-ESEM' analysis was examined the discriminant validity of scale developed by the current study with general silence behavior and voice behavior. It found that the employee silence was distinguished from general silence behavior and voice behavior(Study 2-2). Study 3 was designed for validation of silence motivation scale which developed from Study 1 and Study 2. Based on these results, the implications and limitations of this study as well as the direction for future study were discussed.

The Workflow for Computational Analysis of Single-cell RNA-sequencing Data (단일 세포 RNA 시퀀싱 데이터에 대한 컴퓨터 분석의 작업과정)

  • Sung-Hun WOO;Byung Chul JUNG
    • Korean Journal of Clinical Laboratory Science
    • /
    • v.56 no.1
    • /
    • pp.10-20
    • /
    • 2024
  • RNA-sequencing (RNA-seq) is a technique used for providing global patterns of transcriptomes in samples. However, it can only provide the average gene expression across cells and does not address the heterogeneity within the samples. The advances in single-cell RNA sequencing (scRNA-seq) technology have revolutionized our understanding of heterogeneity and the dynamics of gene expression at the single-cell level. For example, scRNA-seq allows us to identify the cell types in complex tissues, which can provide information regarding the alteration of the cell population by perturbations, such as genetic modification. Since its initial introduction, scRNA-seq has rapidly become popular, leading to the development of a huge number of bioinformatic tools. However, the analysis of the big dataset generated from scRNA-seq requires a general understanding of the preprocessing of the dataset and a variety of analytical techniques. Here, we present an overview of the workflow involved in analyzing the scRNA-seq dataset. First, we describe the preprocessing of the dataset, including quality control, normalization, and dimensionality reduction. Then, we introduce the downstream analysis provided with the most commonly used computational packages. This review aims to provide a workflow guideline for new researchers interested in this field.