• Title/Summary/Keyword: training data

Search Result 7,511, Processing Time 0.033 seconds

Modeling of Hydrologic Time Series using Stochastic Neural Networks Approach (추계학적 신경망 접근법을 이용한 수문학적 시계열의 모형화)

  • Kim, Seong-Won;Kim, Jeong-Heon;Park, Gi-Beom
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2010.05a
    • /
    • pp.1346-1349
    • /
    • 2010
  • The goal of this research is to apply the neural networks models for the disaggregation of the pan evaporation (PE) data, Republic of Korea. The neural networks models consist of generalized regression neural networks model (GRNNM) and multilayer perceptron neural networks model (MLP-NNM), respectively. The disaggregation means that the yearly PE data divides into the monthly PE data. And, for the performances of the neural networks models, they are composed of training and test performances, respectively. The training and test performances consist of the historic, the generated, and the mixed data, respectively. From this research, we evaluate the impact of GRNNM and MLP-NNM for the disaggregation of the nonlinear time series data. We should, furthermore, construct the credible data of the monthly PE from the disaggregation of the yearly PE data, and can suggest the methodology for the irrigation and drainage networks system.

  • PDF

A Study on Satisfaction Survey Based on Regression Analysis to Improve Curriculum for Big Data Education (빅데이터 양성 교육 교과과정 개선을 위한 회귀분석 기반의 만족도 조사에 관한 연구)

  • Choi, Hyun
    • Journal of the Korean Society of Industry Convergence
    • /
    • v.22 no.6
    • /
    • pp.749-756
    • /
    • 2019
  • Big data is structured and unstructured data that is so difficult to collect, store, and so on due to the huge amount of data. Many institutions, including universities, are building student convergence systems to foster talents for data science and AI convergence, but there is an absolute lack of research on what kind of education is needed and what kind of education is required for students. Therefore, in this paper, after conducting the correlation analysis based on the questionnaire on basic surveys and courses to improve the curriculum by grasping the satisfaction and demands of the participants in the "2019 Big Data Youth Talent Training Course" held at K University, Regression analysis was performed. As a result of the study, the higher the satisfaction level, the satisfaction with class or job connection, and the self-development, the more positive the evaluation of program efficiency.

A Federated Multi-Task Learning Model Based on Adaptive Distributed Data Latent Correlation Analysis

  • Wu, Shengbin;Wang, Yibai
    • Journal of Information Processing Systems
    • /
    • v.17 no.3
    • /
    • pp.441-452
    • /
    • 2021
  • Federated learning provides an efficient integrated model for distributed data, allowing the local training of different data. Meanwhile, the goal of multi-task learning is to simultaneously establish models for multiple related tasks, and to obtain the underlying main structure. However, traditional federated multi-task learning models not only have strict requirements for the data distribution, but also demand large amounts of calculation and have slow convergence, which hindered their promotion in many fields. In our work, we apply the rank constraint on weight vectors of the multi-task learning model to adaptively adjust the task's similarity learning, according to the distribution of federal node data. The proposed model has a general framework for solving optimal solutions, which can be used to deal with various data types. Experiments show that our model has achieved the best results in different dataset. Notably, our model can still obtain stable results in datasets with large distribution differences. In addition, compared with traditional federated multi-task learning models, our algorithm is able to converge on a local optimal solution within limited training iterations.

Performance Change accroding to Data Set Size Change in Semi-Supervised Learning based Object Detection (준지도 학습 기반 객체 탐지 모델에서 데이터셋 변화에 따른 성능 변화)

  • Seungsoo Yu;Wonjun Hwang
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2022.11a
    • /
    • pp.88-90
    • /
    • 2022
  • Semi Supervised Learning 은 일부의 data 에는 labeling 을 하고 나머지 data 에는 labeling 을 안한채로 학습을 진행하는 방법이다. Object Detection 은 이미지에서 여러개의 객체들의 대한 위치를 여러개의 바운딩 박스로 지정해서 찾는 Computer Vision task 이다. 당연하게도, model training 단계에서 사용되는 data set 의 크기가 크고 객체가 많을 수록 일반적으로 model 의 성능이 좋아 질 것이다. 하지만 실험 환경에 따라 data set 을 잘 확보하지 못하던가, 실험 장치가 데이터 셋을 감당하지 못하는 등의 문제가 발생 할 수 있다. 그렇기에 본 논문에서는 semi supervised learning based object detection model 을 알아보고 data set 의 크기를 조절해가며 modle 을 training 시킨 뒤 data set 의 크기에 따라 성능이 어떻게 변화하는 지를 알아 볼 것이다.

  • PDF

Preliminary Analysis of Data Quality and Cloud Statistics from Ka-Band Cloud Radar (Ka-밴드 구름레이더 자료품질 및 구름통계 기초연구)

  • Ye, Bo-Young;Lee, GyuWon;Kwon, Soohyun;Lee, Ho-Woo;Ha, Jong-Chul;Kim, Yeon-Hee
    • Atmosphere
    • /
    • v.25 no.1
    • /
    • pp.19-30
    • /
    • 2015
  • The Ka-band cloud radar (KCR) has been operated by the National Institute of Meteorological Research (NIMR) of Korea Meteorological Administration (KMA) at Boseong National Center for Intensive Observation of severe weather since 2013. Evaluation of data quality is an essential process to further analyze cloud information. In this study, we estimate the measurement error and the sampling uncertainty to evaluate data quality. By using vertically pointing data, the statistical uncertainty is obtained by calculating the standard deviation of each radar parameter. The statistical uncertainties decrease as functions of sampling number. The statistical uncertainties of horizontal and vertical reflectivities are identical (0.28 dB). On the other hand, the statistical uncertainties of Doppler velocity (spectrum width) are 2.2 times (1.6 times) larger at the vertical channel. The reflectivity calibration of KCR is also performed using X-band vertically pointing radar (VertiX) and 2-dimensional video disdrometer (2DVD). Since the monitoring of calibration values is useful to evaluate radar condition, the variation of calibration is monitored for five rain events. The average of calibration bias is 10.77 dBZ and standard deviation is 3.69 dB. Finally, the statistical characteristics of cloud properties have been investigated during two months in autumn using calibrated reflectivity. The percentage of clouds is about 26% and 16% on September to October. However, further analyses are required to derive general characteristics of autumn cloud in Korea.

Multiview Data Clustering by using Adaptive Spectral Co-clustering (적응형 분광 군집 방법을 이용한 다중 특징 데이터 군집화)

  • Son, Jeong-Woo;Jeon, Junekey;Lee, Sang-Yun;Kim, Sun-Joong
    • Journal of KIISE
    • /
    • v.43 no.6
    • /
    • pp.686-691
    • /
    • 2016
  • In this paper, we introduced the adaptive spectral co-clustering, a spectral clustering for multiview data, especially data with more than three views. In the adaptive spectral co-clustering, the performance is improved by sharing information from diverse views. For the efficiency in information sharing, a co-training approach is adopted. In the co-training step, a set of parameters are estimated to make all views in data maximally independent, and then, information is shared with respect to estimated parameters. This co-training step increases the efficiency of information sharing comparing with ordinary feature concatenation and co-training methods that assume the independence among views. The adaptive spectral co-clustering was evaluated with synthetic dataset and multi lingual document dataset. The experimental results indicated the efficiency of the adaptive spectral co-clustering with the performances in every iterations and similarity matrix generated with information sharing.

A Fusion Method of Co-training and Label Propagation for Prediction of Bank Telemarketing (은행 텔레마케팅 예측을 위한 레이블 전파와 협동 학습의 결합 방법)

  • Kim, Aleum;Cho, Sung-Bae
    • Journal of KIISE
    • /
    • v.44 no.7
    • /
    • pp.686-691
    • /
    • 2017
  • Telemarketing has become the center of marketing action of the industry in the information society. Recently, machine learning has emerged in many areas, especially, financial prediction. Financial data consists of lots of unlabeled data in most parts, and therefore, it is difficult for humans to perform their labeling. In this paper, we propose a fusion method of semi-supervised learning for automatic labeling of unlabeled data to predict telemarketing. Specifically, we integrate labeling results of label propagation and co-training with a decision tree. The data with lower reliabilities are removed, and the data are extracted that have consistent label from two labeling methods. After adding them to the training set, a decision tree is learned with all of them. To confirm the usefulness of the proposed method, we conduct the experiments with a real telemarketing dataset in a Portugal bank. Accuracy of the proposed method is 83.39%, which is 1.82% higher than that of the conventional method, and precision of the proposed method is 19.37%, which is 2.67% higher than that of the conventional method. As a result, we have shown that the proposed method has a better performance as assessed by the t-test.

Prototype-Based Classification Using Class Hyperspheres (클래스 초월구를 이용한 프로토타입 기반 분류)

  • Lee, Hyun-Jong;Hwang, Doosung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.10
    • /
    • pp.483-488
    • /
    • 2016
  • In this paper, we propose a prototype-based classification learning by using the nearest-neighbor rule. The nearest-neighbor is applied to segment the class area of all the training data with hyperspheres, and a hypersphere must cover the data from the same class. The radius of a hypersphere is computed by the mid point of the two distances to the farthest same class point and the nearest other class point. And we transform the prototype selection problem into a set covering problem in order to determine the smallest set of prototypes that cover all the training data. The proposed prototype selection method is designed by a greedy algorithm and applicable to process a large-scale training set in parallel. The prediction rule is the nearest-neighbor rule and the new training data is the set of prototypes. In experiments, the generalization performance of the proposed method is superior to existing methods.

Influence of abiotic factors on seasonal incidence of pests of tasar Silkworm Antheraea mylitta D.

  • Siddaiah, Aruna A.;Prasad, Rajendra;Rai, Suresh;Dubey, Omprakash;Satpaty, Subrat;Sinha, Ravibhushan;Prsad, Suraj;Sahay, Alok
    • International Journal of Industrial Entomology and Biomaterials
    • /
    • v.29 no.1
    • /
    • pp.135-144
    • /
    • 2014
  • Rearing of tropical tasar silkworm, Antheraea mylitta Drury is mainly conducted in outdoor on Terminalia tomentosa W. & A. a nature grown primary host plant available in forest and also on raised primary host plant Terminalia arjuna Bedd. Temperature, relative humidity and rainfall are the main environmental factors for occurrence of pests (parasites and predators) of tasar silkworm during I, II and III crop rearing in the tropical tasar producing zones. The present study was aimed to study the influence of abiotic factors on prevalence of tasar silkworm pests. The study was conducted at different agro-climatic regions viz., Central Tasar Research &Training Institute, Ranchi, Jharkhand, Regional Extension Centre, Katghora, Chattisgarh and Regional Extension Centre, Hatgamaria during 2010-13 covering 3 seed crop and 6 commercial crops. Data on incidence of tropical tasar silkworm endo-parasitoids like Uzi Fly, Blepharipa zebina Walker and Ichneumon fly (Yellow Fly), Xathopimpla pedator, Fabricius and Predators such as Stink bug (Eocanthecona furcellata Wolf), Reduviid bug (Sycanus collaris Fabricius) and Wasp (Vespa orientalis Linnaeus) was recorded Weekly. The meteorological data was collected daily. Data was collected from 4 different agro-climatic zones of tasar growing areas. Analysis of the data revealed a significant negative correlation between abiotic factors and incidence of ichneumon fly and uzi fly. Based on the 3 years data on prevalence of pests region-wise pest calendars and prediction models were developed.

Small Sample Face Recognition Algorithm Based on Novel Siamese Network

  • Zhang, Jianming;Jin, Xiaokang;Liu, Yukai;Sangaiah, Arun Kumar;Wang, Jin
    • Journal of Information Processing Systems
    • /
    • v.14 no.6
    • /
    • pp.1464-1479
    • /
    • 2018
  • In face recognition, sometimes the number of available training samples for single category is insufficient. Therefore, the performances of models trained by convolutional neural network are not ideal. The small sample face recognition algorithm based on novel Siamese network is proposed in this paper, which doesn't need rich samples for training. The algorithm designs and realizes a new Siamese network model, SiameseFacel, which uses pairs of face images as inputs and maps them to target space so that the $L_2$ norm distance in target space can represent the semantic distance in input space. The mapping is represented by the neural network in supervised learning. Moreover, a more lightweight Siamese network model, SiameseFace2, is designed to reduce the network parameters without losing accuracy. We also present a new method to generate training data and expand the number of training samples for single category in AR and labeled faces in the wild (LFW) datasets, which improves the recognition accuracy of the models. Four loss functions are adopted to carry out experiments on AR and LFW datasets. The results show that the contrastive loss function combined with new Siamese network model in this paper can effectively improve the accuracy of face recognition.