• 제목/요약/키워드: Data Set Comparing

검색결과 409건 처리시간 0.028초

프마이크로어레이 데이터의 유전자 집합 및 대사 경로 분석 (Gene Set and Pathway Analysis of Microarray Data)

  • 김선영
    • 유전체소식지
    • /
    • 제6권1호
    • /
    • pp.29-33
    • /
    • 2006
  • Gene set analysis is a new concept and method. to analyze and interpret microarray gene expression data and tries to extract biological meaning from gene expression data at gene set level rather than at gene level. Compared with methods which select a few tens or hundreds of genes before gene ontology and pathway analysis, gene set analysis identifies important gene ontology terms and pathways more consistently and performs well even in gene expression data sets with minimal or moderate gene expression changes. Moreover, gene set analysis is useful for comparing multiple gene expression data sets dealing with similar biological questions. This review briefly summarizes the rationale behind the gene set analysis and introduces several algorithms and tools now available for gene set analysis.

  • PDF

통계적 수량화 방법을 이용한 효과적인 네트워크 데이터 비교 방법 (Effective and Statistical Quantification Model for Network Data Comparing)

  • 조재익;김호인;문종섭
    • 방송공학회논문지
    • /
    • 제13권1호
    • /
    • pp.86-91
    • /
    • 2008
  • 네트워크 데이터 분석에 있어서 추정모델이 얼마나 모집단을 대표하느냐는 반드시 연구되어야 한다. 본 논문에서는 네트워크 데이터의 각 추출 가능한 표준 정보를 이용하여 현재 공개되어 사용하고 있는 MIT Lincoln Lab의 네트워크 데이터와 모델링 된 KDD CUP 99 데이터를 비교 분석한다. 비교, 분석에 있어서 두 데이터에 공통으로 포함되고 표준 정보인 프로토콜 정보를 이용하여 분석한다. 분석은 통계적 분석 방법인 대응 분석 방법을 이용하여 분석하고, SVD를 이용해 2차원 공간에 표현하며, 가중 유클리드 거리를 이용해 네트워크 데이터를 수량화하였다.

특징 래핑을 통한 숫자형 특징과 범주형 특징이 혼합된 데이터의 클래스 분류 성능 향상 기법 (Improving Classification Performance for Data with Numeric and Categorical Attributes Using Feature Wrapping)

  • 이재성;김대원
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제36권12호
    • /
    • pp.1024-1027
    • /
    • 2009
  • 본 논문에서는 혼합형 데이터에 대한 특징 선별 기법의 효율성을 비교하기 위해 특징 필터링과 특징 래핑을 통한 특징 선별 후, 클래스 분류 성능을 측정하였다. 혼합형 데이터는 숫자형 특징과 범주형 특징이 함께 혼합되어 있으므로, 숫자형 특징을 범주형 특징으로 이산화를 하여 단일형 데이터로 변환한 뒤 특징 선별 기법 등을 적용할 수 있다. 본 연구에서는 혼합형 데이터를 전처리하여 단일형 데이터로 변환하고, 널리 활용되는 특징 필터링 기법과 특징 래핑 기법을 통해 클래스 분류 성능을 높일 수 있는 특징 집합을 선별하였다. 선별된 특징 집합을 통한 클래스 분류 성능을 비교한 결과, 특징 필터링에 비해 특징 래핑을 통해 선별한 특징 집합을 활용하여 클래스 분류를 하였을 때 분류 정확도가 높은 것을 확인할 수 있었다.

TMY2 방식에 의한 국내 기상자료 작성 연구 (TMY2 Weather data for Korea)

  • 신기식;윤창렬;박상동
    • 한국신재생에너지학회:학술대회논문집
    • /
    • 한국신재생에너지학회 2009년도 춘계학술대회 논문집
    • /
    • pp.243-246
    • /
    • 2009
  • To evaluate the building energy performance, many building simulation programs are used and its capabilities are developed. Despite of its increased capabilities the weather data used In the Building Energy performance evaluation, are still using the same limited set of data. This often forces users to find or calculate weather data such as illuminance, solar radiation, and ground temperature from other sources to calculate it. Also, proper selection of a right weather data set has been considered as one of important factors for a successful building energy simulation. In this paper, we describe TMY2 data, a generalized weather data format developed for use, and applied to Seoul region and examine the differences comparing to existing weather data. A set of 23 years raw weather data base has been developed to provide the weather data file for building energy analysis in Seoul.

  • PDF

차세대 CPU를 위한 캐시 메모리 시스템 설계 (Design of Cache Memory System for Next Generation CPU)

  • 조옥래;이정훈
    • 대한임베디드공학회논문지
    • /
    • 제11권6호
    • /
    • pp.353-359
    • /
    • 2016
  • In this paper, we propose a high performance L1 cache structure for the high clock CPU. The proposed cache memory consists of three parts, i.e., a direct-mapped cache to support fast access time, a two-way set associative buffer to reduce miss ratio, and a way-select table. The most recently accessed data is stored in the direct-mapped cache. If a data has a high probability of a repeated reference, when the data is replaced from the direct-mapped cache, the data is stored into the two-way set associative buffer. For the high performance and fast access time, we propose an one way among two ways set associative buffer is selectively accessed based on the way-select table (WST). According to simulation results, access time can be reduced by about 7% and 40% comparing with a direct cache and Intel i7-6700 with two times more space respectively.

정확도 향상을 위한 CNN-LSTM 기반 풍력발전 예측 시스템 (CNN-LSTM based Wind Power Prediction System to Improve Accuracy)

  • 박래진;강성우;이재형;정승민
    • 신재생에너지
    • /
    • 제18권2호
    • /
    • pp.18-25
    • /
    • 2022
  • In this study, we propose a wind power generation prediction system that applies machine learning and data mining to predict wind power generation. This system increases the utilization rate of new and renewable energy sources. For time-series data, the data set was established by measuring wind speed, wind generation, and environmental factors influencing the wind speed. The data set was pre-processed so that it could be applied appropriately to the model. The prediction system applied the CNN (Convolutional Neural Network) to the data mining process and then used the LSTM (Long Short-Term Memory) to learn and make predictions. The preciseness of the proposed system is verified by comparing the prediction data with the actual data, according to the presence or absence of data mining in the model of the prediction system.

Iowa Liquor Sales Data Predictive Analysis Using Spark

  • Ankita Paul;Shuvadeep Kundu;Jongwook Woo
    • Asia pacific journal of information systems
    • /
    • 제31권2호
    • /
    • pp.185-196
    • /
    • 2021
  • The paper aims to analyze and predict sales of liquor in the state of Iowa by applying machine learning algorithms to models built for prediction. We have taken recourse of Azure ML and Spark ML for our predictive analysis, which is legacy machine learning (ML) systems and Big Data ML, respectively. We have worked on the Iowa liquor sales dataset comprising of records from 2012 to 2019 in 24 columns and approximately 1.8 million rows. We have concluded by comparing the models with different algorithms applied and their accuracy in predicting the sales using both Azure ML and Spark ML. We find that the Linear Regression model has the highest precision and Decision Forest Regression has the fastest computing time with the sample data set using the legacy Azure ML systems. Decision Tree Regression model in Spark ML has the highest accuracy with the quickest computing time for the entire data set using the Big Data Spark systems.

Fractal dimension analysis as an easy computational approach to improve breast cancer histopathological diagnosis

  • Lucas Glaucio da Silva;Waleska Rayanne Sizinia da Silva Monteiro;Tiago Medeiros de Aguiar Moreira;Maria Aparecida Esteves Rabelo;Emílio Augusto Campos Pereira de Assis;Gustavo Torres de Souza
    • Applied Microscopy
    • /
    • 제51권
    • /
    • pp.6.1-6.9
    • /
    • 2021
  • Histopathology is a well-established standard diagnosis employed for the majority of malignancies, including breast cancer. Nevertheless, despite training and standardization, it is considered operator-dependent and errors are still a concern. Fractal dimension analysis is a computational image processing technique that allows assessing the degree of complexity in patterns. We aimed here at providing a robust and easily attainable method for introducing computer-assisted techniques to histopathology laboratories. Slides from two databases were used: A) Breast Cancer Histopathological; and B) Grand Challenge on Breast Cancer Histology. Set A contained 2480 images from 24 patients with benign alterations, and 5429 images from 58 patients with breast cancer. Set B comprised 100 images of each type: normal tissue, benign alterations, in situ carcinoma, and invasive carcinoma. All images were analyzed with the FracLac algorithm in the ImageJ computational environment to yield the box count fractal dimension (Db) results. Images on set A on 40x magnification were statistically different (p = 0.0003), whereas images on 400x did not present differences in their means. On set B, the mean Db values presented promising statistical differences when comparing. Normal and/or benign images to in situ and/or invasive carcinoma (all p < 0.0001). Interestingly, there was no difference when comparing normal tissue to benign alterations. These data corroborate with previous work in which fractal analysis allowed differentiating malignancies. Computer-aided diagnosis algorithms may beneficiate from using Db data; specific Db cut-off values may yield ~ 99% specificity in diagnosing breast cancer. Furthermore, the fact that it allows assessing tissue complexity, this tool may be used to understand the progression of the histological alterations in cancer.

대향류형 냉각탑의 탈설계 성능해석 (Off-Design Performance Analysis of a Counterflow-Type Cooling Tower)

  • 신지영;손영석;한동원
    • 설비공학논문집
    • /
    • 제14권3호
    • /
    • pp.191-198
    • /
    • 2002
  • Cooling tower design procedure was set up using conventional Merkel theory, The design data could be different depending on the characteristic curve that the engineer chose. It reveals that the consistent and reasonable criteria are required based on the exact information of the cooling tower Performance. In this study, an off-design performance analysis program for a counterflow-type cooling tower was developed and verified by comparing with experimental data. Also, the off-design performance with various operating conditions was analyzed.

감성공학을 이용한 핸드폰에 대한 선호도 조사 및 해석 (Analysis and Decision Making Purchase for Cellular Phone Using Kansei Engineering)

  • 박성욱;서보혁
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2002년도 학술대회 논문집 전문대학교육위원
    • /
    • pp.175-177
    • /
    • 2002
  • This paper presents a methodology for analyzing individual differences on Kansei evaluation for a set of product samples. This analysis divides subjects into several groups by each subject's Kansei evaluation data according to what kinds of Kansei are related on what kinds of design elements. The basic idea is to classify the results of cluster analysis in individual subject's ranges. A similarity matrix of subject is computed by comparing dendrogram of each subjects. The methodology is applied to analyzing evaluation data of cellular phone design.

  • PDF