• Title/Summary/Keyword: Data Set Comparing

Search Result 405, Processing Time 0.022 seconds

Gene Set and Pathway Analysis of Microarray Data (프마이크로어레이 데이터의 유전자 집합 및 대사 경로 분석)

  • Kim Seon-Young
    • KOGO NEWS
    • /
    • v.6 no.1
    • /
    • pp.29-33
    • /
    • 2006
  • Gene set analysis is a new concept and method. to analyze and interpret microarray gene expression data and tries to extract biological meaning from gene expression data at gene set level rather than at gene level. Compared with methods which select a few tens or hundreds of genes before gene ontology and pathway analysis, gene set analysis identifies important gene ontology terms and pathways more consistently and performs well even in gene expression data sets with minimal or moderate gene expression changes. Moreover, gene set analysis is useful for comparing multiple gene expression data sets dealing with similar biological questions. This review briefly summarizes the rationale behind the gene set analysis and introduces several algorithms and tools now available for gene set analysis.

  • PDF

Effective and Statistical Quantification Model for Network Data Comparing (통계적 수량화 방법을 이용한 효과적인 네트워크 데이터 비교 방법)

  • Cho, Jae-Ik;Kim, Ho-In;Moon, Jong-Sub
    • Journal of Broadcast Engineering
    • /
    • v.13 no.1
    • /
    • pp.86-91
    • /
    • 2008
  • In the field of network data analysis, the research of how much the estimation data reflects the population data is inevitable. This paper compares and analyzes the well known MIT Lincoln Lab network data, which is composed of collectable standard information from the network with the KDD CUP 99 dataset which was composed from the MIT/LL data. For comparison and analysis, the protocol information of both the data was used. Correspondence analysis was used for analysis, SVD was used for 2 dimensional visualization and weigthed euclidean distance was used for network data quantification.

Improving Classification Performance for Data with Numeric and Categorical Attributes Using Feature Wrapping (특징 래핑을 통한 숫자형 특징과 범주형 특징이 혼합된 데이터의 클래스 분류 성능 향상 기법)

  • Lee, Jae-Sung;Kim, Dae-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.12
    • /
    • pp.1024-1027
    • /
    • 2009
  • In this letter, we evaluate the classification performance of mixed numeric and categorical data for comparing the efficiency of feature filtering and feature wrapping. Because the mixed data is composed of numeric and categorical features, the feature selection method was applied to data set after discretizing the numeric features in the given data set. In this study, we choose the feature subset for improving the classification performance of the data set after preprocessing. The experimental result of comparing the classification performance show that the feature wrapping method is more reliable than feature filtering method in the aspect of classification accuracy.

TMY2 Weather data for Korea (TMY2 방식에 의한 국내 기상자료 작성 연구)

  • Shin, Kee-Shik;Yoon, Chang-Ryuel;Park, Sang-Dong
    • 한국신재생에너지학회:학술대회논문집
    • /
    • 2009.06a
    • /
    • pp.243-246
    • /
    • 2009
  • To evaluate the building energy performance, many building simulation programs are used and its capabilities are developed. Despite of its increased capabilities the weather data used In the Building Energy performance evaluation, are still using the same limited set of data. This often forces users to find or calculate weather data such as illuminance, solar radiation, and ground temperature from other sources to calculate it. Also, proper selection of a right weather data set has been considered as one of important factors for a successful building energy simulation. In this paper, we describe TMY2 data, a generalized weather data format developed for use, and applied to Seoul region and examine the differences comparing to existing weather data. A set of 23 years raw weather data base has been developed to provide the weather data file for building energy analysis in Seoul.

  • PDF

Design of Cache Memory System for Next Generation CPU (차세대 CPU를 위한 캐시 메모리 시스템 설계)

  • Jo, Ok-Rae;Lee, Jung-Hoon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.6
    • /
    • pp.353-359
    • /
    • 2016
  • In this paper, we propose a high performance L1 cache structure for the high clock CPU. The proposed cache memory consists of three parts, i.e., a direct-mapped cache to support fast access time, a two-way set associative buffer to reduce miss ratio, and a way-select table. The most recently accessed data is stored in the direct-mapped cache. If a data has a high probability of a repeated reference, when the data is replaced from the direct-mapped cache, the data is stored into the two-way set associative buffer. For the high performance and fast access time, we propose an one way among two ways set associative buffer is selectively accessed based on the way-select table (WST). According to simulation results, access time can be reduced by about 7% and 40% comparing with a direct cache and Intel i7-6700 with two times more space respectively.

CNN-LSTM based Wind Power Prediction System to Improve Accuracy (정확도 향상을 위한 CNN-LSTM 기반 풍력발전 예측 시스템)

  • Park, Rae-Jin;Kang, Sungwoo;Lee, Jaehyeong;Jung, Seungmin
    • New & Renewable Energy
    • /
    • v.18 no.2
    • /
    • pp.18-25
    • /
    • 2022
  • In this study, we propose a wind power generation prediction system that applies machine learning and data mining to predict wind power generation. This system increases the utilization rate of new and renewable energy sources. For time-series data, the data set was established by measuring wind speed, wind generation, and environmental factors influencing the wind speed. The data set was pre-processed so that it could be applied appropriately to the model. The prediction system applied the CNN (Convolutional Neural Network) to the data mining process and then used the LSTM (Long Short-Term Memory) to learn and make predictions. The preciseness of the proposed system is verified by comparing the prediction data with the actual data, according to the presence or absence of data mining in the model of the prediction system.

Iowa Liquor Sales Data Predictive Analysis Using Spark

  • Ankita Paul;Shuvadeep Kundu;Jongwook Woo
    • Asia pacific journal of information systems
    • /
    • v.31 no.2
    • /
    • pp.185-196
    • /
    • 2021
  • The paper aims to analyze and predict sales of liquor in the state of Iowa by applying machine learning algorithms to models built for prediction. We have taken recourse of Azure ML and Spark ML for our predictive analysis, which is legacy machine learning (ML) systems and Big Data ML, respectively. We have worked on the Iowa liquor sales dataset comprising of records from 2012 to 2019 in 24 columns and approximately 1.8 million rows. We have concluded by comparing the models with different algorithms applied and their accuracy in predicting the sales using both Azure ML and Spark ML. We find that the Linear Regression model has the highest precision and Decision Forest Regression has the fastest computing time with the sample data set using the legacy Azure ML systems. Decision Tree Regression model in Spark ML has the highest accuracy with the quickest computing time for the entire data set using the Big Data Spark systems.

Fractal dimension analysis as an easy computational approach to improve breast cancer histopathological diagnosis

  • Lucas Glaucio da Silva;Waleska Rayanne Sizinia da Silva Monteiro;Tiago Medeiros de Aguiar Moreira;Maria Aparecida Esteves Rabelo;Emílio Augusto Campos Pereira de Assis;Gustavo Torres de Souza
    • Applied Microscopy
    • /
    • v.51
    • /
    • pp.6.1-6.9
    • /
    • 2021
  • Histopathology is a well-established standard diagnosis employed for the majority of malignancies, including breast cancer. Nevertheless, despite training and standardization, it is considered operator-dependent and errors are still a concern. Fractal dimension analysis is a computational image processing technique that allows assessing the degree of complexity in patterns. We aimed here at providing a robust and easily attainable method for introducing computer-assisted techniques to histopathology laboratories. Slides from two databases were used: A) Breast Cancer Histopathological; and B) Grand Challenge on Breast Cancer Histology. Set A contained 2480 images from 24 patients with benign alterations, and 5429 images from 58 patients with breast cancer. Set B comprised 100 images of each type: normal tissue, benign alterations, in situ carcinoma, and invasive carcinoma. All images were analyzed with the FracLac algorithm in the ImageJ computational environment to yield the box count fractal dimension (Db) results. Images on set A on 40x magnification were statistically different (p = 0.0003), whereas images on 400x did not present differences in their means. On set B, the mean Db values presented promising statistical differences when comparing. Normal and/or benign images to in situ and/or invasive carcinoma (all p < 0.0001). Interestingly, there was no difference when comparing normal tissue to benign alterations. These data corroborate with previous work in which fractal analysis allowed differentiating malignancies. Computer-aided diagnosis algorithms may beneficiate from using Db data; specific Db cut-off values may yield ~ 99% specificity in diagnosing breast cancer. Furthermore, the fact that it allows assessing tissue complexity, this tool may be used to understand the progression of the histological alterations in cancer.

Off-Design Performance Analysis of a Counterflow-Type Cooling Tower (대향류형 냉각탑의 탈설계 성능해석)

  • 신지영;손영석;한동원
    • Korean Journal of Air-Conditioning and Refrigeration Engineering
    • /
    • v.14 no.3
    • /
    • pp.191-198
    • /
    • 2002
  • Cooling tower design procedure was set up using conventional Merkel theory, The design data could be different depending on the characteristic curve that the engineer chose. It reveals that the consistent and reasonable criteria are required based on the exact information of the cooling tower Performance. In this study, an off-design performance analysis program for a counterflow-type cooling tower was developed and verified by comparing with experimental data. Also, the off-design performance with various operating conditions was analyzed.

Analysis and Decision Making Purchase for Cellular Phone Using Kansei Engineering (감성공학을 이용한 핸드폰에 대한 선호도 조사 및 해석)

  • Park, Seong-Wook;Sea, Bo-Hyeok
    • Proceedings of the KIEE Conference
    • /
    • 2002.06a
    • /
    • pp.175-177
    • /
    • 2002
  • This paper presents a methodology for analyzing individual differences on Kansei evaluation for a set of product samples. This analysis divides subjects into several groups by each subject's Kansei evaluation data according to what kinds of Kansei are related on what kinds of design elements. The basic idea is to classify the results of cluster analysis in individual subject's ranges. A similarity matrix of subject is computed by comparing dendrogram of each subjects. The methodology is applied to analyzing evaluation data of cellular phone design.

  • PDF