• 제목/요약/키워드: datasets

검색결과 1,978건 처리시간 0.027초

Effects of Hyper-parameters and Dataset on CNN Training

  • Nguyen, Huu Nhan;Lee, Chanho
    • 전기전자학회논문지
    • /
    • 제22권1호
    • /
    • pp.14-20
    • /
    • 2018
  • The purpose of training a convolutional neural network (CNN) is to obtain weight factors that give high classification accuracies. The initial values of hyper-parameters affect the training results, and it is important to train a CNN with a suitable hyper-parameter set of a learning rate, a batch size, the initialization of weight factors, and an optimizer. We investigate the effects of a single hyper-parameter while others are fixed in order to obtain a hyper-parameter set that gives higher classification accuracies and requires shorter training time using a proposed VGG-like CNN for training since the VGG is widely used. The CNN is trained for four datasets of CIFAR10, CIFAR100, GTSRB and DSDL-DB. The effects of the normalization and the data transformation for datasets are also investigated, and a training scheme using merged datasets is proposed.

Toward Proper 3D-QSAR Datasets for Parameter Evaluation

  • Cho, Seung Joo
    • 통합자연과학논문집
    • /
    • 제4권3호
    • /
    • pp.197-201
    • /
    • 2011
  • 3D-QSAR techniques including CoMFA have been used a lot for more than two decades now. For now, the perspective of 3D-QSAR has been changed. The realization of gorge activity cliffs and higher chance correlation with many independent variables (IVs) has changed the requirements. Some suggested the benchmarking datasets for 3D-QSAR. However, were they still useful for right reasons? Here, we propose the requirement of any general purpose 3D-QSAR benchmarking datasets for lead optimization, especially for feasibility test of any IVs. Specifically, we summarize the conceptual requirements for an ideal settings for 3D-QSAR especially CoMFA.

Ensemble of Classifiers Constructed on Class-Oriented Attribute Reduction

  • Li, Min;Deng, Shaobo;Wang, Lei
    • Journal of Information Processing Systems
    • /
    • 제16권2호
    • /
    • pp.360-376
    • /
    • 2020
  • Many heuristic attribute reduction algorithms have been proposed to find a single reduct that functions as the entire set of original attributes without loss of classification capability; however, the proposed reducts are not always perfect for these multiclass datasets. In this study, based on a probabilistic rough set model, we propose the class-oriented attribute reduction (COAR) algorithm, which separately finds a reduct for each target class. Thus, there is a strong dependence between a reduct and its target class. Consequently, we propose a type of ensemble constructed on a group of classifiers based on class-oriented reducts with a customized weighted majority voting strategy. We evaluated the performance of our proposed algorithm based on five real multiclass datasets. Experimental results confirm the superiority of the proposed method in terms of four general evaluation metrics.

Gene Expression Signatures for Compound Response in Cancers

  • He, Ningning;Yoon, Suk-Joon
    • Genomics & Informatics
    • /
    • 제9권4호
    • /
    • pp.173-180
    • /
    • 2011
  • Recent trends in generating multiple, large-scale datasets provide new challenges to manipulating the relationship of different types of components, such as gene expression and drug response data. Integrative analysis of compound response and gene expression datasets generates an opportunity to capture the possible mechanism of compounds by using signature genes on diverse types of cancer cell lines. Here, we integrated datasets of compound response and gene expression profiles on NCI60 cell lines and constructed a network, revealing the relationship for 801 compounds and 341 gene probes. As examples, obtusol, which shows an exclusive sensitivity on a small number of colon cell lines, is related to a set of gene probes that have unique overexpression in colon cell lines. We also found that the SLC7A11 gene, a direct target of miR-26b, might be a key element in understanding the action of many diverse classes of anticancer compounds. We demonstrated that this network might be useful for studying the mechanisms of varied compound response on diverse cancer cell lines.

시퀀스 요소 기반의 유사도를 이용한 시퀀스 데이터 클러스터링 (Mining Clusters of Sequence Data using Sequence Element-based Similarity Measure)

  • 오승준;김재련
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 2004년도 추계학술대회
    • /
    • pp.221-229
    • /
    • 2004
  • Recently, there has been enormous growth in the amount of commercial and scientific data, such as protein sequences, retail transactions, and web-logs. Such datasets consist of sequence data that have an inherent sequential nature. However, only a few of the existing clustering algorithms consider sequentiality. This study presents a method for clustering such sequence datasets. The similarity between sequences must be decided before clustering the sequences. This study proposes a new similarity measure to compute the similarity between two sequences using a sequence element. Two clustering algorithms using the proposed similarity measure are proposed: a hierarchical clustering algorithm and a scalable clustering algorithm that uses sampling and a k-nearest neighbor method. Using a splice dataset and synthetic datasets, we show that the quality of clusters generated by our proposed clustering algorithms is better than that of clusters produced by traditional clustering algorithms.

  • PDF

Neighborhood Correlation Image Analysis for Change Detection Using Different Spatial Resolution Imagery

  • Im, Jung-Ho
    • 대한원격탐사학회지
    • /
    • 제22권5호
    • /
    • pp.337-350
    • /
    • 2006
  • The characteristics of neighborhood correlation images for change detection were explored at different spatial resolution scales. Bi-temporal QuickBird datasets of Las Vegas, NV were used for the high spatial resolution image analysis, while bi-temporal Landsat $TM/ETM^{+}$ datasets of Suwon, South Korea were used for the mid spatial resolution analysis. The neighborhood correlation images consisting of three variables (correlation, slope, and intercept) were evaluated and compared between the two scales for change detection. The neighborhood correlation images created using the Landsat datasets resulted in somewhat different patterns from those using the QuickBird high spatial resolution imagery due to several reasons such as the impact of mixed pixels. Then, automated binary change detection was also performed using the single and multiple neighborhood correlation image variables for both spatial resolution image scales.

Comprehensive review on Clustering Techniques and its application on High Dimensional Data

  • Alam, Afroj;Muqeem, Mohd;Ahmad, Sultan
    • International Journal of Computer Science & Network Security
    • /
    • 제21권6호
    • /
    • pp.237-244
    • /
    • 2021
  • Clustering is a most powerful un-supervised machine learning techniques for division of instances into homogenous group, which is called cluster. This Clustering is mainly used for generating a good quality of cluster through which we can discover hidden patterns and knowledge from the large datasets. It has huge application in different field like in medicine field, healthcare, gene-expression, image processing, agriculture, fraud detection, profitability analysis etc. The goal of this paper is to explore both hierarchical as well as partitioning clustering and understanding their problem with various approaches for their solution. Among different clustering K-means is better than other clustering due to its linear time complexity. Further this paper also focused on data mining that dealing with high-dimensional datasets with their problems and their existing approaches for their relevancy

A Manually Captured and Modified Phone Screen Image Dataset for Widget Classification on CNNs

  • Byun, SungChul;Han, Seong-Soo;Jeong, Chang-Sung
    • Journal of Information Processing Systems
    • /
    • 제18권2호
    • /
    • pp.197-207
    • /
    • 2022
  • The applications and user interfaces (UIs) of smart mobile devices are constantly diversifying. For example, deep learning can be an innovative solution to classify widgets in screen images for increasing convenience. To this end, the present research leverages captured images and the ReDraw dataset to write deep learning datasets for image classification purposes. First, as the validation for datasets using ResNet50 and EfficientNet, the experiments show that the dataset composed in this study is helpful for classification according to a widget's functionality. An implementation for widget detection and classification on RetinaNet and EfficientNet is then executed. Finally, the research suggests the Widg-C and Widg-D datasets-a deep learning dataset for identifying the widgets of smart devices-and implementing them for use with representative convolutional neural network models.

Applying Token Tagging to Augment Dataset for Automatic Program Repair

  • Hu, Huimin;Lee, Byungjeong
    • Journal of Information Processing Systems
    • /
    • 제18권5호
    • /
    • pp.628-636
    • /
    • 2022
  • Automatic program repair (APR) techniques focus on automatically repairing bugs in programs and providing correct patches for developers, which have been investigated for decades. However, most studies have limitations in repairing complex bugs. To overcome these limitations, we developed an approach that augments datasets by utilizing token tagging and applying machine learning techniques for APR. First, to alleviate the data insufficiency problem, we augmented datasets by extracting all the methods (buggy and non-buggy methods) in the program source code and conducting token tagging on non-buggy methods. Second, we fed the preprocessed code into the model as an input for training. Finally, we evaluated the performance of the proposed approach by comparing it with the baselines. The results show that the proposed approach is efficient for augmenting datasets using token tagging and is promising for APR.

Modeling Vulnerability Discovery Process in Major Cryptocurrencies

  • Joh, HyunChul;Lee, JooYoung
    • Journal of Multimedia Information System
    • /
    • 제9권3호
    • /
    • pp.191-200
    • /
    • 2022
  • These days, businesses, in both online and offline, have started accepting cryptocurrencies as payment methods. Even in countries like El Salvador, cryptocurrencies are recognized as fiat currencies. Meanwhile, publicly known, but not patched software vulnerabilities are security threats to not only software users but also to our society in general. As the status of cryptocurrencies has gradually increased, the impact of security vulnerabilities related to cryptocurrencies on our society has increased as well. In this paper, we first analyze vulnerabilities from the two major cryptocurrency vendors of Bitcoin and Ethereum in a quantitative manner with the respect to the CVSS, to see how the vulnerabilities are roughly structured in those systems. Then we introduce a modified AML vulnerability discovery model for the vulnerability datasets from the two vendors, after showing the original AML dose not accurately represent the vulnerability discovery trends on the datasets. The analysis shows that the modified model performs better than the original AML model for the vulnerability datasets from the major cryptocurrencies.