• Title/Summary/Keyword: small data set

Search Result 662, Processing Time 0.023 seconds

Deep Learning for Pet Image Classification (애완동물 분류를 위한 딥러닝)

  • Shin, Kwang-Seong;Shin, Seong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2019.05a
    • /
    • pp.151-152
    • /
    • 2019
  • In this paper, we propose an improved learning method based on a small data set for animal image classification. First, CNN creates a training model for a small data set and uses the data set to expand the data set of the training set Second, a bottleneck of a small data set is extracted using a pre-trained network for a large data set such as VGG16 and stored in two NumPy files as a new training data set and a test data set, finally, learn the fully connected network as a new data set.

  • PDF

Study on the Improvement of Machine Learning Ability through Data Augmentation (데이터 증강을 통한 기계학습 능력 개선 방법 연구)

  • Kim, Tae-woo;Shin, Kwang-seong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.346-347
    • /
    • 2021
  • For pattern recognition for machine learning, the larger the amount of learning data, the better its performance. However, it is not always possible to secure a large amount of learning data with the types and information of patterns that must be detected in daily life. Therefore, it is necessary to significantly inflate a small data set for general machine learning. In this study, we study techniques to augment data so that machine learning can be performed. A representative method of performing machine learning using a small data set is the transfer learning technique. Transfer learning is a method of obtaining a result by performing basic learning with a general-purpose data set and then substituting the target data set into the final stage. In this study, a learning model trained with a general-purpose data set such as ImageNet is used as a feature extraction set using augmented data to detect a desired pattern.

  • PDF

Training for Huge Data set with On Line Pruning Regression by LS-SVM

  • Kim, Dae-Hak;Shim, Joo-Yong;Oh, Kwang-Sik
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.10a
    • /
    • pp.137-141
    • /
    • 2003
  • LS-SVM(least squares support vector machine) is a widely applicable and useful machine learning technique for classification and regression analysis. LS-SVM can be a good substitute for statistical method but computational difficulties are still remained to operate the inversion of matrix of huge data set. In modern information society, we can easily get huge data sets by on line or batch mode. For these kind of huge data sets, we suggest an on line pruning regression method by LS-SVM. With relatively small number of pruned support vectors, we can have almost same performance as regression with full data set.

  • PDF

A Study for Assessment Scope Set-up of Road Noise in EIA (환경영향평가시 도로소음 평가범위 설정에 대한 연구)

  • Choi, Joongyu;Sun, Hyosung;Choung, Taeryang
    • Journal of Environmental Impact Assessment
    • /
    • v.21 no.4
    • /
    • pp.567-572
    • /
    • 2012
  • This paper suggests the set-up plan of the assessment scope in road noise considering road characteristics with the prediction model of road noise. The RLS90 prediction model with some assumptions is used to establish the assessment scope of road noise. The main contents of the applied assumptions are smooth drive of cars, flat region, location of all noise sources in one lane, drive in design speed, and set-up of assessment scope according to traffic volume and car speed. The information of traffic volume to predict road noise is obtained by the distribution of small cars and full-sized cars in road. In this study, the total traffic volume in road is computed by adding the number of small cars to the conversion number of small cars, which means the number of small cars making the same noise as one full-sized car. The prediction result of road noise with the influence factor of traffic volume, car speed, distance between road and receiver is presented. The resultant assessment scope of road noise is obtained by combining road noise prediction data with the set-up standard of road noise assessment scope.

Deep Learning Approach Based on Transcriptome Profile for Data Driven Drug Discovery

  • Eun-Ji Kwon;Hyuk-Jin Cha
    • Molecules and Cells
    • /
    • v.46 no.1
    • /
    • pp.65-67
    • /
    • 2023
  • SMILES (simplified molecular-input line-entry system) information of small molecules parsed by one-hot array is passed to a convolutional neural network called black box. Outputs data representing a gene signature is then matched to the genetic signature of a disease to predict the appropriate small molecule. Efficacy of the predicted small molecules is examined by in vivo animal models. GSEA, gene set enrichment analysis.

Cascade Network Based Bolt Inspection In High-Speed Train

  • Gu, Xiaodong;Ding, Ji
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.10
    • /
    • pp.3608-3626
    • /
    • 2021
  • The detection of bolts is an important task in high-speed train inspection systems, and it is frequently performed to ensure the safety of trains. The difficulty of the vision-based bolt inspection system lies in small sample defect detection, which makes the end-to-end network ineffective. In this paper, the problem is resolved in two stages, which includes the detection network and cascaded classification networks. For small bolt detection, all bolts including defective bolts and normal bolts are put together for conducting annotation training, a new loss function and a new boundingbox selection based on the smallest axis-aligned convex set are proposed. These allow YOLOv3 network to obtain the accurate position and bounding box of the various bolts. The average precision has been greatly improved on PASCAL VOC, MS COCO and actual data set. After that, the Siamese network is employed for estimating the status of the bolts. Using the convolutional Siamese network, we are able to get strong results on few-shot classification. Extensive experiments and comparisons on actual data set show that the system outperforms state-of-the-art algorithms in bolt inspection.

Evaluating the effect of the size of brand consideration set upon the Gutenberg′s monopolistic price interval (고려상표군 크기에 따른 구텐베르그의 가격독점영역에 관한 연구)

  • 백지원;황선진;이수진
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.27 no.8
    • /
    • pp.1004-1013
    • /
    • 2003
  • This study addressed an ill-understood issue of a price response model and a monopolistic price interval of fashion goods. The concept of monopolistic price interval introduced by Gutenberg has been rarely applied to the fashion goods, which is known as price sensitive goods. Thus, this study examined the price insensitive zone of the blue jean. The data of 268 respondents were analyzed using Choice-based Conjoint (CBC) analysis and t-test. Considering brand consideration set as a price determinant, we found the presence of monopolistic price interval of the jean. The results obtained from the CBC analysis showed that the bigger the size of brand consideration set, the shorter the monopolistic interval. This implied that the consumer who had a small brand consideration set was more likely to have a longer monopolistic price interval than the one who had a large brand consideration set, since the consumer with a small consideration set tended to value brand itself more than price. Although significant monopolistic price intervals were shown only for the three jean brands out of the seven, to reduce the size of brand consideration set and to increase brand loyalty were found important in maximizing firms'financial profits.

A Robotic Medical Palpation using Contact Pressure Distribution (접촉 압력 분포를 이용한 로봇 의료 촉진)

  • Kim, Hyoungkyun;Choi, Seungmoon;Chung, Wan Kyun
    • The Journal of Korea Robotics Society
    • /
    • v.12 no.3
    • /
    • pp.322-331
    • /
    • 2017
  • In this paper we present a novel robotic palpation method for the lump shape estimation using contact pressure distribution. Many previous researches about the robotic palpation have used a stiffness map, which is not suitable to obtain geometrical information of a lump. As a result, they require a large data set and long palpation time to estimate the lump shape. Instead of using the stiffness map, the proposed palpation method uses the difference between the normal force direction and the surface normal to detect the lump boundary and estimate its normal. The palpation trajectory is generated by the normal of the lump boundary to track the lump boundary in real-time. The proposed approach requires small data set and short palpation time for the lump shape estimation since the shape can be directly estimated from the optimally generated palpation trajectory. An experiment result shows that our method can find the lump shape accurately in real-time with small data and short time.

Predictive Analysis of Financial Fraud Detection using Azure and Spark ML

  • Priyanka Purushu;Niklas Melcher;Bhagyashree Bhagwat;Jongwook Woo
    • Asia pacific journal of information systems
    • /
    • v.28 no.4
    • /
    • pp.308-319
    • /
    • 2018
  • This paper aims at providing valuable insights on Financial Fraud Detection on a mobile money transactional activity. We have predicted and classified the transaction as normal or fraud with a small sample and massive data set using Azure and Spark ML, which are traditional systems and Big Data respectively. Experimenting with sample dataset in Azure, we found that the Decision Forest model is the most accurate to proceed in terms of the recall value. For the massive data set using Spark ML, it is found that the Random Forest classifier algorithm of the classification model proves to be the best algorithm. It is presented that the Spark cluster gets much faster to build and evaluate models as adding more servers to the cluster with the same accuracy, which proves that the large scale data set can be predictable using Big Data platform. Finally, we reached a recall score with 0.73, which implies a satisfying prediction quality in predicting fraudulent transactions.

Measurement of Dose Distribution in Small Beams of Philips 6 and 8 MVX Linear Accelerator (Philips LINAC 6 MV와 8 MV X선 소조사연에 대한 선량분포 측정)

  • Suh Tae-suk;Yoon Sei Chul;Shinn Kyung Sub;Park Yong Whee
    • Radiation Oncology Journal
    • /
    • v.9 no.1
    • /
    • pp.143-152
    • /
    • 1991
  • The work suggested in this paper addresses a method for collecting beam data for small circular fields. Beam data were obtained from philips 6 and 8 MV LINAC at Dept. Radiation Therapy at Gainesville Incorporated and Shands Teaching Hospital. Specific quantities measured include tissue maximum ratio (TMR), off-axis ratio (OAR) and relative output factor (ROF) In small field irradiation, special collimators were used to produce circular fields of 1 cm to 3 cm diameter in 2 mm steps, measured at SAO (soura axis distance) of 100 cm. Diode detector was chosen for primary beam measurement and compared with measurements made with photographic film and TLD dosimeters. The measured TMRs and OARs were formulated from limited measurements to generate basic beam data for reference set-up. The empirical formula were later, extended and generalized for any possible set-up using the trends of fitting parameters. The measured TMRs and OARs were well represented by the fitting formula developed.

  • PDF