• 제목/요약/키워드: small datasets

검색결과 165건 처리시간 0.024초

Effects of Single Nucleotide Polymorphism Marker Density on Haplotype Block Partition

  • Kim, Sun Ah;Yoo, Yun Joo
    • Genomics & Informatics
    • /
    • 제14권4호
    • /
    • pp.196-204
    • /
    • 2016
  • Many researchers have found that one of the most important characteristics of the structure of linkage disequilibrium is that the human genome can be divided into non-overlapping block partitions in which only a small number of haplotypes are observed. The location and distribution of haplotype blocks can be seen as a population property influenced by population genetic events such as selection, mutation, recombination and population structure. In this study, we investigate the effects of the density of markers relative to the full set of all polymorphisms in the region on the results of haplotype partitioning for five popular haplotype block partition methods: three methods in Haploview (confidence interval, four gamete test, and solid spine), MIG++ implemented in PLINK 1.9 and S-MIG++. We used several experimental datasets obtained by sampling subsets of single nucleotide polymorphism (SNP) markers of chromosome 22 region in the 1000 Genomes Project data and also the HapMap phase 3 data to compare the results of haplotype block partitions by five methods. With decreasing sampling ratio down to 20% of the original SNP markers, the total number of haplotype blocks decreases and the length of haplotype blocks increases for all algorithms. When we examined the marker-independence of the haplotype block locations constructed from the datasets of different density, the results using below 50% of the entire SNP markers were very different from the results using the entire SNP markers. We conclude that the haplotype block construction results should be used and interpreted carefully depending on the selection of markers and the purpose of the study.

초음파 B-모드 영상에서 FCN(fully convolutional network) 모델을 이용한 간 섬유화 단계 분류 알고리즘 (A Fully Convolutional Network Model for Classifying Liver Fibrosis Stages from Ultrasound B-mode Images)

  • 강성호;유선경;이정은;안치영
    • 대한의용생체공학회:의공학회지
    • /
    • 제41권1호
    • /
    • pp.48-54
    • /
    • 2020
  • In this paper, we deal with a liver fibrosis classification problem using ultrasound B-mode images. Commonly representative methods for classifying the stages of liver fibrosis include liver biopsy and diagnosis based on ultrasound images. The overall liver shape and the smoothness and roughness of speckle pattern represented in ultrasound images are used for determining the fibrosis stages. Although the ultrasound image based classification is used frequently as an alternative or complementary method of the invasive biopsy, it also has the limitations that liver fibrosis stage decision depends on the image quality and the doctor's experience. With the rapid development of deep learning algorithms, several studies using deep learning methods have been carried out for automated liver fibrosis classification and showed superior performance of high accuracy. The performance of those deep learning methods depends closely on the amount of datasets. We propose an enhanced U-net architecture to maximize the classification accuracy with limited small amount of image datasets. U-net is well known as a neural network for fast and precise segmentation of medical images. We design it newly for the purpose of classifying liver fibrosis stages. In order to assess the performance of the proposed architecture, numerical experiments are conducted on a total of 118 ultrasound B-mode images acquired from 78 patients with liver fibrosis symptoms of F0~F4 stages. The experimental results support that the performance of the proposed architecture is much better compared to the transfer learning using the pre-trained model of VGGNet.

Network Anomaly Traffic Detection Using WGAN-CNN-BiLSTM in Big Data Cloud-Edge Collaborative Computing Environment

  • Yue Wang
    • Journal of Information Processing Systems
    • /
    • 제20권3호
    • /
    • pp.375-390
    • /
    • 2024
  • Edge computing architecture has effectively alleviated the computing pressure on cloud platforms, reduced network bandwidth consumption, and improved the quality of service for user experience; however, it has also introduced new security issues. Existing anomaly detection methods in big data scenarios with cloud-edge computing collaboration face several challenges, such as sample imbalance, difficulty in dealing with complex network traffic attacks, and difficulty in effectively training large-scale data or overly complex deep-learning network models. A lightweight deep-learning model was proposed to address these challenges. First, normalization on the user side was used to preprocess the traffic data. On the edge side, a trained Wasserstein generative adversarial network (WGAN) was used to supplement the data samples, which effectively alleviates the imbalance issue of a few types of samples while occupying a small amount of edge-computing resources. Finally, a trained lightweight deep learning network model is deployed on the edge side, and the preprocessed and expanded local data are used to fine-tune the trained model. This ensures that the data of each edge node are more consistent with the local characteristics, effectively improving the system's detection ability. In the designed lightweight deep learning network model, two sets of convolutional pooling layers of convolutional neural networks (CNN) were used to extract spatial features. The bidirectional long short-term memory network (BiLSTM) was used to collect time sequence features, and the weight of traffic features was adjusted through the attention mechanism, improving the model's ability to identify abnormal traffic features. The proposed model was experimentally demonstrated using the NSL-KDD, UNSW-NB15, and CIC-ISD2018 datasets. The accuracies of the proposed model on the three datasets were as high as 0.974, 0.925, and 0.953, respectively, showing superior accuracy to other comparative models. The proposed lightweight deep learning network model has good application prospects for anomaly traffic detection in cloud-edge collaborative computing architectures.

확장된 Relief-F 알고리즘을 이용한 소규모 크기 문서의 자동분류 (Document Classification of Small Size Documents Using Extended Relief-F Algorithm)

  • 박흠
    • 정보처리학회논문지B
    • /
    • 제16B권3호
    • /
    • pp.233-238
    • /
    • 2009
  • 자질 수가 적은 소규모 크기 문서들의 자동분류는 좋은 성능을 얻기 어렵다. 그 이유는 문서집단 전체의 자질 수는 크지만 단위 문서 내 자질 수가 상대적으로 너무 적기 때문에 문서간 유사도가 너무 낮아 우수한 분류 알고리즘을 적용해도 좋은 성능을 얻지 못한다. 특히 웹 디렉토리 문서들의 자동분류에서나, 디스크 복구 작업에서 유사도 평가와 자동분류로 연결되지 않은 섹터를 연결하는 작업에서와 같은 소규모 크기 문서의 자동분류에서는 좋은 성능을 얻지 못한다. 따라서 본 논문에서는 소규모 크기 문서의 자동분류에서의 문제점을 해결하기 위해 분류 사전작업으로, 예제기반 자질 필터링 방법 Relief-F알고리즘을 소규모 문서 내 자질 필터링에 적합한 ERelief-F 알고리즘을 제시한다. 또 비교 실험을 위해, 기존의 자질 필터링 방법 중 Odds Ratio와 정보이득, 또 Relief-F 알고리즘을 함께 실험하여 분류결과를 비교하였다. 그 결과, ERelief-F 알고리즘을 사용했을 때의 결과가 정보이득과 Odds Ratio, Relief-F보다 월등히 우수한 성능을 보였고 부적절한 자질도 많이 줄일 수 있었다.

입자 군집 최적화법을 이용한 소형루프 전자탐사 자료의 층서구조 전기비저항 역해석 (Layered-earth Resistivity Inversion of Small-loop Electromagnetic Survey Data using Particle Swarm Optimization)

  • 장한길로
    • 지구물리와물리탐사
    • /
    • 제22권4호
    • /
    • pp.186-194
    • /
    • 2019
  • 물리탐사 자료의 역산 해를 찾는데 흔히 이용되는 결정론적 해법은 지역 최소점에 빠져 적절한 해에 수렴하지 못할 가능성이 크다는 단점이 존재한다. 이 문제를 해결하기 위한 대안 중 하나는 확률론적 접근법에 기반한 전역 최적화 방법을 이용하는 것이며, 여러 방법들 중에서 입자 군집 최적화(Particle Swarm Optimization, PSO)법의 적용사례가 많이 소개되었다. 이 논문에서는 PSO법을 이용한 소형루프 전자탐사 자료의 층서 구조 전기비저항 역해석 알고리즘을 개발하고 합성자료를 이용하여 역산실험을 수행하였다. 실험결과 기존의 Gauss-Newton 알고리즘으로는 최적의 역산해를 찾는데 어려움이 있는 소형루프 전자탐사 자료의 역산 시도에 PSO 방법을 적용하면 성공률을 높일 수 있음을 확인하였다.

Performance Analysis of Group Recommendation Systems in TV Domains

  • Kim, Noo-Ri;Lee, Jee-Hyong
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제15권1호
    • /
    • pp.45-52
    • /
    • 2015
  • Although researchers have proposed various recommendation systems, most recommendation approaches are for single users and there are only a small number of recommendation approaches for groups. However, TV programs or movies are most often viewed by groups rather than by single users. Most recommendation approaches for groups assume that single users' profiles are known and that group profiles consist of the single users' profiles. However, because it is difficult to obtain group profiles, researchers have only used synthetic or limited datasets. In this paper, we report on various group recommendation approaches to a real large-scale dataset in a TV domain, and evaluate the various group recommendation approaches. In addition, we provide some guidelines for group recommendation systems, focusing on home group users in a TV domain.

Morphology and Molecular Characterization of a Fungus from the Alternaria alternata Species Complex Causing Black Spots on Pyrus sinkiangensis (Koerle pear)

  • Aung, Sein Lai Lai;Liu, Hai Feng;Pei, Dong Fang;Lu, Bing Bin;Oo, May Moe;Deng, Jian Xin
    • Mycobiology
    • /
    • 제48권3호
    • /
    • pp.233-239
    • /
    • 2020
  • A small-spored Alternaria was found from black spots of storaged Koerle pear (Pyrus sinkiangensis), one of the economically important fruit in Xinjiang province, China. The morphology is similar to A. limoniasperae but obviously different in secondary conidiophores and conidial septa. A phylogenetic analysis using sequence datasets of ITS, GAPDH, TEF1, RPB2, Alt a1, OPA10-2, and EndoPG genes revealed that it belonged to the Alternaria alternata complex group. Pathogenicity tests illustrated that the fungus was the causal pathogen of black spot on Koerle pear fruit.

Efficient Greedy Algorithms for Influence Maximization in Social Networks

  • Lv, Jiaguo;Guo, Jingfeng;Ren, Huixiao
    • Journal of Information Processing Systems
    • /
    • 제10권3호
    • /
    • pp.471-482
    • /
    • 2014
  • Influence maximization is an important problem of finding a small subset of nodes in a social network, such that by targeting this set, one will maximize the expected spread of influence in the network. To improve the efficiency of algorithm KK_Greedy proposed by Kempe et al., we propose two improved algorithms, Lv_NewGreedy and Lv_CELF. By combining all of advantages of these two algorithms, we propose a mixed algorithm Lv_MixedGreedy. We conducted experiments on two synthetically datasets and show that our improved algorithms have a matching influence with their benchmark algorithms, while being faster than them.

Principal Component Regression by Principal Component Selection

  • Lee, Hosung;Park, Yun Mi;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제22권2호
    • /
    • pp.173-180
    • /
    • 2015
  • We propose a selection procedure of principal components in principal component regression. Our method selects principal components using variable selection procedures instead of a small subset of major principal components in principal component regression. Our procedure consists of two steps to improve estimation and prediction. First, we reduce the number of principal components using the conventional principal component regression to yield the set of candidate principal components and then select principal components among the candidate set using sparse regression techniques. The performance of our proposals is demonstrated numerically and compared with the typical dimension reduction approaches (including principal component regression and partial least square regression) using synthetic and real datasets.

Deep learning for stage prediction in neuroblastoma using gene expression data

  • Park, Aron;Nam, Seungyoon
    • Genomics & Informatics
    • /
    • 제17권3호
    • /
    • pp.30.1-30.4
    • /
    • 2019
  • Neuroblastoma is a major cause of cancer death in early childhood, and its timely and correct diagnosis is critical. Gene expression datasets have recently been considered as a powerful tool for cancer diagnosis and subtype classification. However, no attempts have yet been made to apply deep learning using gene expression to neuroblastoma classification, although deep learning has been applied to cancer diagnosis using image data. Taking the International Neuroblastoma Staging System stages as multiple classes, we designed a deep neural network using the gene expression patterns and stages of neuroblastoma patients. Despite a small patient population (n = 280), stage 1 and 4 patients were well distinguished. If it is possible to replicate this approach in a larger population, deep learning could play an important role in neuroblastoma staging.