• Title/Summary/Keyword: Number of training data

Search Result 948, Processing Time 0.036 seconds

An Improved Co-training Method without Feature Split (속성분할이 없는 향상된 협력학습 방법)

  • 이창환;이소민
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.10
    • /
    • pp.1259-1265
    • /
    • 2004
  • In many applications, producing labeled data is costly and time consuming while an enormous amount of unlabeled data is available with little cost. Therefore, it is natural to ask whether we can take advantage of these unlabeled data in classification teaming. In machine learning literature, the co-training method has been widely used for this purpose. However, the current co-training method requires the entire features to be split into two independent sets. Therefore, in this paper, we improved the current co-training method in a number of ways, and proposed a new co-training method which do not need the feature split. Experimental results show that our proposed method can significantly improve the performance of the current co-training algorithm.

Performance Improvement of Nearest-neighbor Classification Learning through Prototype Selections (프로토타입 선택을 이용한 최근접 분류 학습의 성능 개선)

  • Hwang, Doo-Sung
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.2
    • /
    • pp.53-60
    • /
    • 2012
  • Nearest-neighbor classification predicts the class of an input data with the most frequent class among the near training data of the input data. Even though nearest-neighbor classification doesn't have a training stage, all of the training data are necessary in a predictive stage and the generalization performance depends on the quality of training data. Therefore, as the training data size increase, a nearest-neighbor classification requires the large amount of memory and the large computation time in prediction. In this paper, we propose a prototype selection algorithm that predicts the class of test data with the new set of prototypes which are near-boundary training data. Based on Tomek links and distance metric, the proposed algorithm selects boundary data and decides whether the selected data is added to the set of prototypes by considering classes and distance relationships. In the experiments, the number of prototypes is much smaller than the size of original training data and we takes advantages of storage reduction and fast prediction in a nearest-neighbor classification.

Study on the Effect of Training Data Sampling Strategy on the Accuracy of the Landslide Susceptibility Analysis Using Random Forest Method (Random Forest 기법을 이용한 산사태 취약성 평가 시 훈련 데이터 선택이 결과 정확도에 미치는 영향)

  • Kang, Kyoung-Hee;Park, Hyuck-Jin
    • Economic and Environmental Geology
    • /
    • v.52 no.2
    • /
    • pp.199-212
    • /
    • 2019
  • In the machine learning techniques, the sampling strategy of the training data affects a performance of the prediction model such as generalizing ability as well as prediction accuracy. Especially, in landslide susceptibility analysis, the data sampling procedure is the essential step for setting the training data because the number of non-landslide points is much bigger than the number of landslide points. However, the previous researches did not consider the various sampling methods for the training data. That is, the previous studies selected the training data randomly. Therefore, in this study the authors proposed several different sampling methods and assessed the effect of the sampling strategies of the training data in landslide susceptibility analysis. For that, total six different scenarios were set up based on the sampling strategies of landslide points and non-landslide points. Then Random Forest technique was trained on the basis of six different scenarios and the attribute importance for each input variable was evaluated. Subsequently, the landslide susceptibility maps were produced using the input variables and their attribute importances. In the analysis results, the AUC values of the landslide susceptibility maps, obtained from six different sampling strategies, showed high prediction rates, ranges from 70 % to 80 %. It means that the Random Forest technique shows appropriate predictive performance and the attribute importance for the input variables obtained from Random Forest can be used as the weight of landslide conditioning factors in the susceptibility analysis. In addition, the analysis results obtained using specific sampling strategies for training data show higher prediction accuracy than the analysis results using the previous random sampling method.

An Efficient Detection Method for Rail Surface Defect using Limited Label Data (한정된 레이블 데이터를 이용한 효율적인 철도 표면 결함 감지 방법)

  • Seokmin Han
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.1
    • /
    • pp.83-88
    • /
    • 2024
  • In this research, we propose a Semi-Supervised learning based railroad surface defect detection method. The Resnet50 model, pretrained on ImageNet, was employed for the training. Data without labels are randomly selected, and then labeled to train the ResNet50 model. The trained model is used to predict the results of the remaining unlabeled training data. The predicted values exceeding a certain threshold are selected, sorted in descending order, and added to the training data. Pseudo-labeling is performed based on the class with the highest probability during this process. An experiment was conducted to assess the overall class classification performance based on the initial number of labeled data. The results showed an accuracy of 98% at best with less than 10% labeled training data compared to the overall training data.

Vehicle License Plate Recognition Using the Training Data's Annexation (훈련예제 병합을 이용한 자동차 차량번호판 문자인식 성능 향상 방안)

  • Baik, Nam Cheol;Lee, Sang Hyup;Ryu, Kwang Ryul
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.26 no.3D
    • /
    • pp.349-352
    • /
    • 2006
  • To cope with traffic congestion, traffic accidents and lack of parking facilities, caused by dramatic increase in total vehicle number, vigorous researches on managing vehicles efficiently are done, both domestically and internationally. The vehicle license plate recognition makes effective management of traffic possible, with its wide application in many fields, covering from speed enforcement, collecting toll, stolen vehicle detection to parking management. The vehicle license plate recognition system causes high cost for collecting training data. Many researches are done by using the virtual sample method, which can be effective for utilizing limited number of training data by generating virtual sample. This paper investigates techniques to improve the performance of vehicle license plate recognition by using the training data's annexation. Also, popular methods for virtual sample creation used for text recognition algorithm are analyzed and their effectiveness is verified.

The Perceived Utility of Education and Training in SMEs on Employee Satisfaction: The Moderating Role of HRM Department Activities (중소기업 재직자들의 교육훈련에 대한 인지된 유용성이 교육 훈련 만족도에 미치는 영향: 인사부서 활동의 조절효과)

  • Park, Ji-Sung;Chae, Hee-Sun
    • Asia-Pacific Journal of Business
    • /
    • v.12 no.4
    • /
    • pp.241-251
    • /
    • 2021
  • Purpose - Drawing on the content-process approach, this study examines the effect of employees' perceived utility of education and training in small and medium enterprises (SMEs) on their satisfaction. In addition, this study investigates how the human resource management department' activities moderate the relationship between employees' perceived utility of education and training and satisfaction. Design/methodology/approach - This study predicts the positive relationship between employees' perceived utility of education and training and satisfaction, and HR activities strengthens this positive relationship. To test these hypotheses, this study utilized Human Capital Corporate Panel (HCCP) datasets, especially 2017 data at the individual level. The number of the final sample is 425 for the test. Moreover, this study used the hierarchical regression model with SPSS. Finding - As predicted, the analytical results with the hierarchical regression model showed that employees' percieved utility of education and training and satisfaction were positively related. In addition, HR activities strengthened this relationship between employees' percieved utility of education and training and satisfaction. Research implications or Originality - This study will provide academic and practical implications for future research on human resource development, especially SMEs by deepening an understanding of the important factors in order to increase employees' satisfaction of education and training. the number of viewers is found in most American films released in Korea.

A Study on Measures to Improve Satisfaction with Vocational Competency Development Training (직업능력개발훈련 만족도 향상을 위한 방안 연구)

  • Tae-Bok Kim;Kwang-Soo Kim
    • Journal of the Korea Safety Management & Science
    • /
    • v.25 no.2
    • /
    • pp.167-174
    • /
    • 2023
  • Currently, the budget for vocational competency development training has been expanded, but the number of participants has decreased. As the budget for the Vocational Competency Development Project increases, the participation of a large number of people becomes necessary. This study aims to derive factors that affect satisfaction by selecting factors related to respondent characteristics, training institutions, training types, and job performance for satisfaction with vocational competency development training, and to study ways to improve satisfaction. Data were collected through focus group interviews (FGI), and logistic regression analysis was conducted through feasibility review and reliability analysis. As a result, in the case of the model, it was confirmed that the degree of agreement between the case actually measured and the case predicted by the model was low in the Hosmer and Lemeshow test, but the overall classification accuracy was classified as 96.0% in the classification accuracy table. As for the influence of the factors, the result was derived that the application of knowledge technology, training institution facility equipment, Business Collaboration, long-term work plan, and satisfaction with work performed have an influence in the order.

Num Worker Tuner: An Automated Spawn Parameter Tuner for Multi-Processing DataLoaders

  • Synn, DoangJoo;Kim, JongKook
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.11a
    • /
    • pp.446-448
    • /
    • 2021
  • In training a deep learning model, it is crucial to tune various hyperparameters and gain speed and accuracy. While hyperparameters that mathematically induce convergence impact training speed, system parameters that affect host-to-device transfer are also crucial. Therefore, it is important to properly tune and select parameters that influence the data loader as a system parameter in overall time acceleration. We propose an automated framework called Num Worker Tuner (NWT) to address this problem. This method finds the appropriate number of multi-processing subprocesses through the search space and accelerates the learning through the number of subprocesses. Furthermore, this method allows memory efficiency and speed-up by tuning the system-dependent parameter, the number of multi-process spawns.

Measurement of Construction Material Quantity through Analyzing Images Acquired by Drone And Data Augmentation (드론 영상 분석과 자료 증가 방법을 통한 건설 자재 수량 측정)

  • Moon, Ji-Hwan;Song, Nu-Lee;Choi, Jae-Gab;Park, Jin-Ho;Kim, Gye-Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.1
    • /
    • pp.33-38
    • /
    • 2020
  • This paper proposes a technique for counting construction materials by analyzing an image acquired by a Drone. The proposed technique use drone log which includes drone and camera information, RCNN for predicting construction material type, dummy area and Photogrammetry for counting the number of construction material. The existing research has large error ranges for predicting construction material detection and material dummy area, because of a lack of training data. To reduce the error ranges and improve prediction stability, this paper increases the training data with a method of data augmentation, but only uses rotated training data for data augmentation to prevent overfitting of the training model. For the quantity calculation, we use a drone log containing drones and camera information such as Yaw and FOV, RCNN model to find the pile of building materials in the image and to predict the type. And we synthesize all the information and apply it to the formula suggested in the paper to calculate the actual quantity of material pile. The superiority of the proposed method is demonstrated through experiments.

Improving Accuracy of Noise Review Filtering for Places with Insufficient Training Data

  • Hyeon Gyu Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.7
    • /
    • pp.19-27
    • /
    • 2023
  • In the process of collecting social reviews, a number of noise reviews irrelevant to a given search keyword can be included in the search results. To filter out such reviews, machine learning can be used. However, if the number of reviews is insufficient for a target place to be analyzed, filtering accuracy can be degraded due to the lack of training data. To resolve this issue, we propose a supervised learning method to improve accuracy of the noise review filtering for the places with insufficient reviews. In the proposed method, training is not performed by an individual place, but by a group including several places with similar characteristics. The classifier obtained through the training can be used for the noise review filtering of an arbitrary place belonging to the group, so the problem of insufficient training data can be resolved. To verify the proposed method, a noise review filtering model was implemented using LSTM and BERT, and filtering accuracy was checked through experiments using real data collected online. The experimental results show that the accuracy of the proposed method was 92.4% on the average, and it provided 87.5% accuracy when targeting places with less than 100 reviews.