• Title/Summary/Keyword: multivariate data analysis

Search Result 1,405, Processing Time 0.03 seconds

Gallbladder Carcinoma: Analysis of Prognostic Factors in 132 Cases

  • Wang, Rui-Tao;Xu, Xin-Sen;Liu, Jun;Liu, Chang
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.6
    • /
    • pp.2511-2514
    • /
    • 2012
  • Objective: To evaluate the prognostic factors of gallbladder carcinoma. Methods: Presentation, operative data, complications, and survival outcome were examined for 132 gallbladder carcinoma patients who underwent gallbladder surgery in our unit during 2002-2007, and follow-up results were obtained from every patient for univariate and multivariate survival analysis. Results: The univariate analysis showed that gallbladder lesion history, tumor cell differentiation, Nevin staging, preoperative lymph node metastasis and the surgical approach significantly correlated with the prognosis of the patients (p<0.05). The results of the multivariate analysis (Cox regression) showed that gallbladder lesion history, Nevin staging and the surgical approach were independent predicators with relative risks of 6.9, 4.4, 2.8, respectively (p=0.002, 0.003, 0.008). Conclusion: Gallbladder lesion history, Nevin staging and the surgical approach are independent prognostic factors for gallbladder carcinoma, a rapidly fatal disease. Therefore, early diagnosis, anti-infective therapy and radical surgery are greatly needed to improve the prognosis of gallbladder carcinoma.

Unsupervised Clustering of Multivariate Time Series Microarray Experiments based on Incremental Non-Gaussian Analysis

  • Ng, Kam Swee;Yang, Hyung-Jeong;Kim, Soo-Hyung;Kim, Sun-Hee;Anh, Nguyen Thi Ngoc
    • International Journal of Contents
    • /
    • v.8 no.1
    • /
    • pp.23-29
    • /
    • 2012
  • Multiple expression levels of genes obtained using time series microarray experiments have been exploited effectively to enhance understanding of a wide range of biological phenomena. However, the unique nature of microarray data is usually in the form of large matrices of expression genes with high dimensions. Among the huge number of genes presented in microarrays, only a small number of genes are expected to be effective for performing a certain task. Hence, discounting the majority of unaffected genes is the crucial goal of gene selection to improve accuracy for disease diagnosis. In this paper, a non-Gaussian weight matrix obtained from an incremental model is proposed to extract useful features of multivariate time series microarrays. The proposed method can automatically identify a small number of significant features via discovering hidden variables from a huge number of features. An unsupervised hierarchical clustering representative is then taken to evaluate the effectiveness of the proposed methodology. The proposed method achieves promising results based on predictive accuracy of clustering compared to existing methods of analysis. Furthermore, the proposed method offers a robust approach with low memory and computation costs.

Differentiation of Roots of Glycyrrhiza Species by 1H Nuclear Magnetic Resonance Spectroscopy and Multivariate Statistical Analysis

  • Yang, Seung-Ok;Hyun, Sun-Hee;Kim, So-Hyun;Kim, Hee-Su;Lee, Jae-Hwi;Whang, Wan-Kyun;Lee, Min-Won;Choi, Hyung-Kyoon
    • Bulletin of the Korean Chemical Society
    • /
    • v.31 no.4
    • /
    • pp.825-828
    • /
    • 2010
  • To classify Glycyrrhiza species, samples of different species were analyzed by $^1H$ NMR-based metabolomics technique. Partial least squares discriminant analysis (PLS-DA) was used as the multivariate statistical analysis of the 1H NMR data sets. There was a clear separation between various Glycyrrhiza species in the PLS-DA derived score plots. The PLS-DA model was validated, and the key metabolites contributing to the separation in the score plots of various Glycyrrhiza species were lactic acid, alanine, arginine, proline, malic acid, asparagine, choline, glycine, glucose, sucrose, 4-hydroxy-phenylacetic acid, and formic acid. The compounds present at relatively high levels were glucose, and 4-hydroxyphenylacetic acid in G. glabra; lactic acid, alanine, and proline in G. inflata; and arginine, malic acid, and sucrose in G. uralensis. This is the first study to perform the global metabolomic profiling and differentiation of Glycyrrhiza species using $^1H$ NMR and multivariate statistical analysis.

The Evaluation of Water Quality in Coastal Sea of Incheon Using a Multivariate Analysis (다변량 해석기법을 이용한 인천연안해역의 수질평가)

  • Kim, Jong-Gu
    • Journal of Environmental Science International
    • /
    • v.15 no.11
    • /
    • pp.1017-1025
    • /
    • 2006
  • This study was conducted to evaluate characteristic of water duality in coastal sea of Incheon using a multivariate analysis. The analysis data in coastal sea of Incheon was aquired by the NFRDI data which was surveyed from March 1997 to November 2003. Eleven water quality parameters were determined on each survey The results were summarized as follow : Water quality in Incheon coastal sea could be explained up to 64.62% by three factors which were included in loading of fresh water and nutrients by the land(36.98%), seasonal variation(16.19%), and internal metabolism (11.24%). The results of time series analysis by factor score, in case of factor 1, station 1 influenced by Han river was shown to high factor score and station 3 located by outer sea was shown to low factor score. In case of factor 2, station 1 was appeared to high variation and station 3 was appeared to low variation. The result of cluster analysis by station was classified into three group that has different water quality characteristics. Especially, station 1 which affected by Han river and station 4 which affected by sewage treatment plant was appeared to considerable water quality characteristics against other station. In yearly cluster analysis, three group was classified and water quality in 2003 years due to high precipitation was different to another year. It could be suggested from these results that it is important to control discharge of fresh water by Han rivet and sewage treatment plant for water quality management of coastal sea of Incheon.

Evaluation of Water Quality Using Multivariate Statistic Analysis in Busan Coastal Area

  • Kim, Sang-Soo;Cho, Jang-Sik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.3
    • /
    • pp.531-542
    • /
    • 2004
  • Principal component analysis and cluster analysis were conducted to comprehensively evaluate the water quality of Busan coastal area with the data collected seasonally by the analysis of surface water at 10 stations from 1997 to 2003. We noted that the first principal component was regarded as a factor related with the input of nutrient-rich fresh water and the second principal component as meteorological characteristics. Also we obtained that water qualities of station 4 and 9 were different from those of other stations in Busan coastal area.

  • PDF

A Query Model for Consecutive Analyses of Dynamic Multivariate Graphs (동적 다변량 그래프의 연속적 분석을 위한 질의 모델 설계 및 구현)

  • Bae, Yechan;Ham, Doyoung;Kim, Taeyang;Jeong, Hayjin;Kim, Dongyoon
    • The Journal of Korean Association of Computer Education
    • /
    • v.17 no.6
    • /
    • pp.103-113
    • /
    • 2014
  • This study designed and implemented a query model for consecutive analyses of dynamic multivariate graph data. First, the query model consists of two procedures; setting the discriminant function, and determining an alteration method. Second, the query model was implemented as a query system that consists of a query panel, a graph visualization panel, and a property panel. A Node-Link Diagram and the Force-Directed Graph Drawing algorithm were used for the visualization of the graph. The results of the queries are visually presented through the graph visualization panel. Finally, this study used the data of worldwide import & export data of small arms to verify our model. The significance of this research is in the fact that, through the model which is able to conduct consecutive analyses on dynamic graph data, it helps overcome the limitations of previous models which can only perform discrete analysis on dynamic data. This research is expected to contribute to future studies such as online decision making and complex network analysis, that use dynamic graph models.

  • PDF

A Study on Application Range of Continuum Model to Discontinuous Rock mass with Numerical Analysis (불연속지반의 연속체 모델 적용범위에 대한 수치해석적 연구)

  • 이경우;노상림;윤지선
    • Proceedings of the Korean Geotechical Society Conference
    • /
    • 2002.03a
    • /
    • pp.197-204
    • /
    • 2002
  • In this study, multivariate analysis based on domestic data(958 EA) of road tunnel, and suggest the easy prediction equation of Q-system. We generate applicable Q-value to numerical analysis method with using the equation and investigate the behavior as variable Q-value of rock mass induced excavation with discontinuum numerical analysis method, UDEC. In the result of the experiment, we research the application range of Q-value to apply the continuum model to discontinuous rock mass is below 0.7 and we testify the applicability of continuum model as researched Q-value with continuum numerical analysis method, FLAC.

  • PDF

Analysis of Factors for Korean Women's Cancer Screening through Hadoop-Based Public Medical Information Big Data Analysis (Hadoop기반의 공개의료정보 빅 데이터 분석을 통한 한국여성암 검진 요인분석 서비스)

  • Park, Min-hee;Cho, Young-bok;Kim, So Young;Park, Jong-bae;Park, Jong-hyock
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.10
    • /
    • pp.1277-1286
    • /
    • 2018
  • In this paper, we provide flexible scalability of computing resources in cloud environment and Apache Hadoop based cloud environment for analysis of public medical information big data. In fact, it includes the ability to quickly and flexibly extend storage, memory, and other resources in a situation where log data accumulates or grows over time. In addition, when real-time analysis of accumulated unstructured log data is required, the system adopts Hadoop-based analysis module to overcome the processing limit of existing analysis tools. Therefore, it provides a function to perform parallel distributed processing of a large amount of log data quickly and reliably. Perform frequency analysis and chi-square test for big data analysis. In addition, multivariate logistic regression analysis of significance level 0.05 and multivariate logistic regression analysis of meaningful variables (p<0.05) were performed. Multivariate logistic regression analysis was performed for each model 3.

Relationships Between the Characteristics of Algae Occurrence and Environmental Factors in Lake Juam, Korea (주암호의 조류 발생 특성과 수질요인의 상관성 연구)

  • Seo, Kyungae;Jung, Soojung;Park, Jonghwan;Hwang, Kyoungseop;Lim, Byungjin
    • Journal of Korean Society on Water Environment
    • /
    • v.29 no.3
    • /
    • pp.317-328
    • /
    • 2013
  • The purpose of this study was to investigate the change of phytoplankton fluctuation and long term of water quality of Lake Juam and to evaluate the relationship between phytoplankton pattern and environmental factors data. Correlation and factor analyses were employed to identify key environmental factors affecting phytoplankton dynamics. Of 18 parameters, pH, temperature, COD, BOD and T-P were highly correlated with Chl-a. Phytoplankton data showed that cyanobacteria were dominant, and more than 60% of total algae density. Also Lake Juam received a lot of influence of the Asian monsoon climate. This study presents necessity of multivariate statistic techniques for evaluation of Lake Juam complex data set with a view to get better information data and effective management of water source.

Shelf-life prediction of fresh ginseng packaged with plastic films based on a kinetic model and multivariate accelerated shelf-life testing

  • Jong-Jin Park;Jeong-Hee Choi;Kee-Jai Park;Jeong-Seok Cho;Dae-Yong Yun;Jeong-Ho Lim
    • Food Science and Preservation
    • /
    • v.30 no.4
    • /
    • pp.573-588
    • /
    • 2023
  • The purpose of this study was to monitor changes in the quality of ginseng and predict its shelf-life. As the storage period of ginseng increased, some quality indicators, such as water-soluble pectin (WSP), CDTA-soluble pectin (CSP), cellulose, weight loss, and microbial growth increased, while others (Na2CO3-soluble pectin/NSP, hemicellulose, starch, and firmness) decreased. Principal component analysis (PCA) was performed using the quality attribute data and the principal component 1 (PC1) scores extracted from the PCA results were applied to the multivariate analysis. The reaction rate at different temperatures and the temperature dependence of the reaction rate were determined using kinetic and Arrhenius models, respectively. Among the kinetic models, zeroth-order models with cellulose and a PC1 score provided an adequate fit for reaction rate estimation. Hence, the prediction model was constructed by applying the cellulose and PC1 scores to the zeroth-order kinetic and Arrhenius models. The prediction model with PC1 score showed higher R2 values (0.877-0.919) than those of cellulose (0.797-0.863), indicating that multivariate analysis using PC1 score is more accurate for the shelf-life prediction of ginseng. The predicted shelf-life using the multivariate accelerated shelf-life test at 5, 20, and 35℃ was 40, 16, and 7 days, respectively.