• Title/Summary/Keyword: Number of training data

Search Result 947, Processing Time 0.036 seconds

The Effect of the Number of Training Data on Speech Recognition

  • Lee, Chang-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2E
    • /
    • pp.66-71
    • /
    • 2009
  • In practical applications of speech recognition, one of the fundamental questions might be on the number of training data that should be provided for a specific task. Though plenty of training data would undoubtedly enhance the system performance, we are then faced with the problem of heavy cost. Therefore, it is of crucial importance to determine the least number of training data that will afford a certain level of accuracy. For this purpose, we investigate the effect of the number of training data on the speaker-independent speech recognition of isolated words by using FVQ/HMM. The result showed that the error rate is roughly inversely proportional to the number of training data and grows linearly with the vocabulary size.

A Study on Reliability Analysis According to the Number of Training Data and the Number of Training (훈련 데이터 개수와 훈련 횟수에 따른 과도학습과 신뢰도 분석에 대한 연구)

  • Kim, Sung Hyeock;Oh, Sang Jin;Yoon, Geun Young;Kim, Wan
    • Korean Journal of Artificial Intelligence
    • /
    • v.5 no.1
    • /
    • pp.29-37
    • /
    • 2017
  • The range of problems that can be handled by the activation of big data and the development of hardware has been rapidly expanded and machine learning such as deep learning has become a very versatile technology. In this paper, mnist data set is used as experimental data, and the Cross Entropy function is used as a loss model for evaluating the efficiency of machine learning, and the value of the loss function in the steepest descent method is We applied the Gradient Descent Optimize algorithm to minimize and updated weight and bias via backpropagation. In this way we analyze optimal reliability value corresponding to the number of exercises and optimal reliability value without overfitting. And comparing the overfitting time according to the number of data changes based on the number of training times, when the training frequency was 1110 times, we obtained the result of 92%, which is the optimal reliability value without overfitting.

A Co-training Method based on Classification Using Unlabeled Data (비분류표시 데이타를 이용하는 분류 기반 Co-training 방법)

  • 윤혜성;이상호;박승수;용환승;김주한
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.8
    • /
    • pp.991-998
    • /
    • 2004
  • In many practical teaming problems including bioinformatics area, there is a small amount of labeled data along with a large pool of unlabeled data. Labeled examples are fairly expensive to obtain because they require human efforts. In contrast, unlabeled examples can be inexpensively gathered without an expert. A common method with unlabeled data for data classification and analysis is co-training. This method uses a small set of labeled examples to learn a classifier in two views. Then each classifier is applied to all unlabeled examples, and co-training detects the examples on which each classifier makes the most confident predictions. After some iterations, new classifiers are learned in training data and the number of labeled examples is increased. In this paper, we propose a new co-training strategy using unlabeled data. And we evaluate our method with two classifiers and two experimental data: WebKB and BIND XML data. Our experimentation shows that the proposed co-training technique effectively improves the classification accuracy when the number of labeled examples are very small.

Factors Affecting Productivity of Medical Personnel in Training Hospital (병원의 특성에 따른 의료 인력의 진료 생산성 결정요인)

  • Lee, Myung-Keyn
    • Journal of Preventive Medicine and Public Health
    • /
    • v.20 no.1 s.21
    • /
    • pp.56-66
    • /
    • 1987
  • Information on productivity of hospital personnel is required for optimum staffing and hospital management. This study deals with the quantitative aspects of workload of medical personnel in training hospitals by their specific characteristics. Specifically this study attempted to find relevant determinants of the productivity of medical personnel using multiple stepwise regression analysis based on data obtained from 135 training hospitals. The findings of this study were as follows: 1) Daily average number of outpatients and inpatients treated by a physician were 20.4 and 10.2, respectively. 2) Daily average number of patients cared by a nurse was 8.2. Daily average number of tests performed by pathologic technician and radiologic technician were 83.2 and 21.5, respectively. 3) Productivity of medical personnel were significantly different for the three groups of factors: hospital sire (number of beds, number of medical personnel per 100 beds): institutional characteristics (medical school affiliation, training type, profit status); and environmental factors (location, number of physician and beds per 1,000 population in the region). 4) The factors a(footing the productivity varied according to the types of medical profession: the number if beds, the number of physicians per 100 beds, training type, and profit status for physicians; the number of nurses per 100 beds, the number of beds, medical school affiliation for nurses; the number of physicians per 100 beds, the number of technicians per 100 beds, and ownership for pathologic technicians; the number o( technicians, training type, and the number of physicians per 100 beds for radiologic technician.

  • PDF

Study on the Effect of Discrepancy of Training Sample Population in Neural Network Classification

  • Lee, Sang-Hoon;Kim, Kwang-Eun
    • Korean Journal of Remote Sensing
    • /
    • v.18 no.3
    • /
    • pp.155-162
    • /
    • 2002
  • Neural networks have been focused on as a robust classifier for the remotely sensed imagery due to its statistical independency and teaming ability. Also the artificial neural networks have been reported to be more tolerant to noise and missing data. However, unlike the conventional statistical classifiers which use the statistical parameters for the classification, a neural network classifier uses individual training sample in teaming stage. The training performance of a neural network is know to be very sensitive to the discrepancy of the number of the training samples of each class. In this paper, the effect of the population discrepancy of training samples of each class was analyzed with three layered feed forward network. And a method for reducing the effect was proposed and experimented with Landsat TM image. The results showed that the effect of the training sample size discrepancy should be carefully considered for faster and more accurate training of the network. Also, it was found that the proposed method which makes teaming rate as a function of the number of training samples in each class resulted in faster and more accurate training of the network.

Study of Fall Detection System According to Number of Nodes of Hidden-Layer in Long Short-Term Memory Using 3-axis Acceleration Data (3축 가속도 데이터를 이용한 장단기 메모리의 노드수에 따른 낙상감지 시스템 연구)

  • Jeong, Seung Su;Kim, Nam Ho;Yu, Yun Seop
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.516-518
    • /
    • 2022
  • In this paper, we introduce a dependence of number of nodes of hidden-layer in fall detection system using Long Short-Term Memory that can detect falls. Its training is carried out using the parameter theta(θ), which indicates the angle formed by the x, y, and z-axis data for the direction of gravity using a 3-axis acceleration sensor. In its learning, validation is performed and divided into training data and test data in a ratio of 8:2, and training is performed by changing the number of nodes in the hidden layer to increase efficiency. When the number of nodes is 128, the best accuracy is shown with Accuracy = 99.82%, Specificity = 99.58%, and Sensitivity = 100%.

  • PDF

Time-domain Sound Event Detection Algorithm Using Deep Neural Network (심층신경망을 이용한 시간 영역 음향 이벤트 검출 알고리즘)

  • Kim, Bum-Jun;Moon, Hyeongi;Park, Sung-Wook;Jeong, Youngho;Park, Young-Cheol
    • Journal of Broadcast Engineering
    • /
    • v.24 no.3
    • /
    • pp.472-484
    • /
    • 2019
  • This paper proposes a time-domain sound event detection algorithm using DNN (Deep Neural Network). In this system, time domain sound waveform data which is not converted into the frequency domain is used as input to the DNN. The overall structure uses CRNN structure, and GLU, ResNet, and Squeeze-and-excitation blocks are applied. And proposed structure uses structure that considers features extracted from several layers together. In addition, under the assumption that it is practically difficult to obtain training data with strong labels, this study conducted training using a small number of weakly labeled training data and a large number of unlabeled training data. To efficiently use a small number of training data, the training data applied data augmentation methods such as time stretching, pitch change, DRC (dynamic range compression), and block mixing. Unlabeled data was supplemented with insufficient training data by attaching a pseudo-label. In the case of using the neural network and the data augmentation method proposed in this paper, the sound event detection performance is improved by about 6 %(based on the f-score), compared with the case where the neural network of the CRNN structure is used by training in the conventional method.

Influence on overfitting and reliability due to change in training data

  • Kim, Sung-Hyeock;Oh, Sang-Jin;Yoon, Geun-Young;Jung, Yong-Gyu;Kang, Min-Soo
    • International Journal of Advanced Culture Technology
    • /
    • v.5 no.2
    • /
    • pp.82-89
    • /
    • 2017
  • The range of problems that can be handled by the activation of big data and the development of hardware has been rapidly expanded and machine learning such as deep learning has become a very versatile technology. In this paper, mnist data set is used as experimental data, and the Cross Entropy function is used as a loss model for evaluating the efficiency of machine learning, and the value of the loss function in the steepest descent method is We applied the GradientDescentOptimize algorithm to minimize and updated weight and bias via backpropagation. In this way we analyze optimal reliability value corresponding to the number of exercises and optimal reliability value without overfitting. And comparing the overfitting time according to the number of data changes based on the number of training times, when the training frequency was 1110 times, we obtained the result of 92%, which is the optimal reliability value without overfitting.

A Study on Characteristics of Neural Network Model for Reservoir Inflow Forecasting (저수지 유입량 예측을 위한 신경망 모형의 특성 연구)

  • Kim, Jae-Hvung;Yoon, Yong-Nam
    • Journal of the Korean Society of Hazard Mitigation
    • /
    • v.2 no.4 s.7
    • /
    • pp.123-129
    • /
    • 2002
  • In this study the results of Chungju reservoir inflow forecasting using 3 layered neural network model were analyzed in order to investigate the characteristics of neural network model for reservoir inflow forecasting. The proper neuron numbers of input and hidden layer were proposed after examining the variations of forecasted values according to neuron number and training epoch changes, and the probability of underestimation was judged by deliberating the variation characteristics of forecasting according to the differences between training and forecasting peak inflow magnitudes. In addition, necessary minimum training data size for precise forecasting was proposed. As a result, We confirmed the probability that excessive neuron number and training epoch cause over-fitting and judged that applying $8{\sim}10$ neurons, $1500{\sim}3000$ training epochs might be suitable in the case of Chungju reservoir inflow forecasting. When the peak inflow of training data set was larger than the forecasted one, it was confirmed that the forecasted values could be underestimated. And when the comparative short period training data was applied to neural networks, relatively inaccurate forecasting outputs were resulted and applying more than 600 training data was recommended for more precise forecasting in Chungju reservoir.

Training Algorithms of Neuro-fuzzy Systems Using Evolution Strategy (진화전략을 이용한 뉴로퍼지 시스템의 학습방법)

  • 정성훈
    • Proceedings of the IEEK Conference
    • /
    • 2001.06c
    • /
    • pp.173-176
    • /
    • 2001
  • This paper proposes training algorithms of neuro-fuzzy systems. First, we introduce a structure training algorithm, which produces the necessary number of hidden nodes from training data. From this algorithm, initial fuzzy rules are also obtained. Second, the parameter training algorithm using evolution strategy is introduced. In order to show their usefulness, we apply our neuro-fuzzy system to a nonlinear system identification problem. It was found from experiments that proposed training algorithms works well.

  • PDF