• Title/Summary/Keyword: supervised machine learning

Search Result 268, Processing Time 0.024 seconds

Korean Text to Gloss: Self-Supervised Learning approach

  • Thanh-Vu Dang;Gwang-hyun Yu;Ji-yong Kim;Young-hwan Park;Chil-woo Lee;Jin-Young Kim
    • Smart Media Journal
    • /
    • v.12 no.1
    • /
    • pp.32-46
    • /
    • 2023
  • Natural Language Processing (NLP) has grown tremendously in recent years. Typically, bilingual, and multilingual translation models have been deployed widely in machine translation and gained vast attention from the research community. On the contrary, few studies have focused on translating between spoken and sign languages, especially non-English languages. Prior works on Sign Language Translation (SLT) have shown that a mid-level sign gloss representation enhances translation performance. Therefore, this study presents a new large-scale Korean sign language dataset, the Museum-Commentary Korean Sign Gloss (MCKSG) dataset, including 3828 pairs of Korean sentences and their corresponding sign glosses used in Museum-Commentary contexts. In addition, we propose a translation framework based on self-supervised learning, where the pretext task is a text-to-text from a Korean sentence to its back-translation versions, then the pre-trained network will be fine-tuned on the MCKSG dataset. Using self-supervised learning help to overcome the drawback of a shortage of sign language data. Through experimental results, our proposed model outperforms a baseline BERT model by 6.22%.

A Study on the Work Type of Machine Learning Administrative Service in Metropolitan Government (광역자치단체의 기계학습 행정서비스 업무유형에 관한 연구 -서울시를 중심으로-)

  • Ha, Chung-Yeol;Jung, Jin-Teak
    • Journal of Digital Convergence
    • /
    • v.18 no.12
    • /
    • pp.29-36
    • /
    • 2020
  • The background of this study is that machine learning administrative services are recently attracting attention as a major policy tool for non-face-to-face administrative services in the post-corona era. This study investigated the types of work expected to be effective when introducing machine learning administrative services for Seoul Metropolitan Government officials who are piloting machine learning administrative services. The research method is a machine that can be introduced by organizational unit by distributing and collecting questionnaires for Seoul administrative organizations that have performed machine learning-based administrative services for one month in July 2020 targeting Seoul public officials using machine learning-based administrative services. By analyzing the learning administration service and application service, the business characteristics of each machine learning administration service type such as supervised learning work type, unsupervised learning work type, and reinforced learning work type were analyzed. As a result of the research analysis, it was found that there were significant differences in the characteristics of administrative tasks by supervised and unsupervised learning areas. In particular, it was found that the reinforcement learning domain contains the most appropriate business characteristics for machine learning administrative services. Implications were drawn. The results of this study can be provided as a reference material to practitioners who want to introduce machine learning administration services, and can be used as basic data for research to researchers who want to study machine learning administration services in the future.

Semi-supervised Software Defect Prediction Model Based on Tri-training

  • Meng, Fanqi;Cheng, Wenying;Wang, Jingdong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.4028-4042
    • /
    • 2021
  • Aiming at the problem of software defect prediction difficulty caused by insufficient software defect marker samples and unbalanced classification, a semi-supervised software defect prediction model based on a tri-training algorithm was proposed by combining feature normalization, over-sampling technology, and a Tri-training algorithm. First, the feature normalization method is used to smooth the feature data to eliminate the influence of too large or too small feature values on the model's classification performance. Secondly, the oversampling method is used to expand and sample the data, which solves the unbalanced classification of labelled samples. Finally, the Tri-training algorithm performs machine learning on the training samples and establishes a defect prediction model. The novelty of this model is that it can effectively combine feature normalization, oversampling techniques, and the Tri-training algorithm to solve both the under-labelled sample and class imbalance problems. Simulation experiments using the NASA software defect prediction dataset show that the proposed method outperforms four existing supervised and semi-supervised learning in terms of Precision, Recall, and F-Measure values.

A Fall Detection Technique using Features from Multiple Sliding Windows

  • Pant, Sudarshan;Kim, Jinsoo;Lee, Sangdon
    • Smart Media Journal
    • /
    • v.7 no.4
    • /
    • pp.79-89
    • /
    • 2018
  • In recent years, falls among elderly people have gained serious attention as a major cause of injuries. Falls often lead to fatal consequences due to lack of prompt response and rescue. Therefore, a more accurate fall detection system and an effective feature extraction technique are required to prevent and reduce the risk of such incidents. In this paper, we proposed an efficient feature extraction technique based on multiple sliding windows and validated it through a series of experiments using supervised learning algorithms. The experiments were conducted using the public datasets obtained from tri-axial accelerometers. The results depicted that extraction of the feature from adjacent sliding windows led to high accuracy in supervised machine learning-based fall detection. Also, the experiments conducted in this study suggested that the best accuracy can be achieved by keeping the window size as small as 2 seconds. With the kNN classifier and dataset from wearable sensors, the experiments achieved accuracy rates of 94%.

A study on semi-supervised kernel ridge regression estimation (준지도 커널능형회귀모형에 관한 연구)

  • Seok, Kyungha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.2
    • /
    • pp.341-353
    • /
    • 2013
  • In many practical machine learning and data mining applications, unlabeled data are inexpensive and easy to obtain. Semi-supervised learning try to use such data to improve prediction performance. In this paper, a semi-supervised regression method, semi-supervised kernel ridge regression estimation, is proposed on the basis of kernel ridge regression model. The proposed method does not require a pilot estimation of the label of the unlabeled data. This means that the proposed method has good advantages including less number of parameters, easy computing and good generalization ability. Experiments show that the proposed method can effectively utilize unlabeled data to improve regression estimation.

Improving Chest X-ray Image Classification via Integration of Self-Supervised Learning and Machine Learning Algorithms

  • Tri-Thuc Vo;Thanh-Nghi Do
    • Journal of information and communication convergence engineering
    • /
    • v.22 no.2
    • /
    • pp.165-171
    • /
    • 2024
  • In this study, we present a novel approach for enhancing chest X-ray image classification (normal, Covid-19, edema, mass nodules, and pneumothorax) by combining contrastive learning and machine learning algorithms. A vast amount of unlabeled data was leveraged to learn representations so that data efficiency is improved as a means of addressing the limited availability of labeled data in X-ray images. Our approach involves training classification algorithms using the extracted features from a linear fine-tuned Momentum Contrast (MoCo) model. The MoCo architecture with a Resnet34, Resnet50, or Resnet101 backbone is trained to learn features from unlabeled data. Instead of only fine-tuning the linear classifier layer on the MoCopretrained model, we propose training nonlinear classifiers as substitutes for softmax in deep networks. The empirical results show that while the linear fine-tuned ImageNet-pretrained models achieved the highest accuracy of only 82.9% and the linear fine-tuned MoCo-pretrained models an increased highest accuracy of 84.8%, our proposed method offered a significant improvement and achieved the highest accuracy of 87.9%.

Improving Efficiency of Food Hygiene Surveillance System by Using Machine Learning-Based Approaches (기계학습을 이용한 식품위생점검 체계의 효율성 개선 연구)

  • Cho, Sanggoo;Cho, Seung Yong
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.53-67
    • /
    • 2020
  • This study employees a supervised learning prediction model to detect nonconformity in advance of processed food manufacturing and processing businesses. The study was conducted according to the standard procedure of machine learning, such as definition of objective function, data preprocessing and feature engineering and model selection and evaluation. The dependent variable was set as the number of supervised inspection detections over the past five years from 2014 to 2018, and the objective function was to maximize the probability of detecting the nonconforming companies. The data was preprocessed by reflecting not only basic attributes such as revenues, operating duration, number of employees, but also the inspections track records and extraneous climate data. After applying the feature variable extraction method, the machine learning algorithm was applied to the data by deriving the company's risk, item risk, environmental risk, and past violation history as feature variables that affect the determination of nonconformity. The f1-score of the decision tree, one of ensemble models, was much higher than those of other models. Based on the results of this study, it is expected that the official food control for food safety management will be enhanced and geared into the data-evidence based management as well as scientific administrative system.

Word Sense Similarity Clustering Based on Vector Space Model and HAL (벡터 공간 모델과 HAL에 기초한 단어 의미 유사성 군집)

  • Kim, Dong-Sung
    • Korean Journal of Cognitive Science
    • /
    • v.23 no.3
    • /
    • pp.295-322
    • /
    • 2012
  • In this paper, we cluster similar word senses applying vector space model and HAL (Hyperspace Analog to Language). HAL measures corelation among words through a certain size of context (Lund and Burgess 1996). The similarity measurement between a word pair is cosine similarity based on the vector space model, which reduces distortion of space between high frequency words and low frequency words (Salton et al. 1975, Widdows 2004). We use PCA (Principal Component Analysis) and SVD (Singular Value Decomposition) to reduce a large amount of dimensions caused by similarity matrix. For sense similarity clustering, we adopt supervised and non-supervised learning methods. For non-supervised method, we use clustering. For supervised method, we use SVM (Support Vector Machine), Naive Bayes Classifier, and Maximum Entropy Method.

  • PDF

A Study of Machine Learning-Based Scheduling Strategy for Fuzzing (기계학습 기반 스케줄링 전략을 적용한 최신 퍼징 연구)

  • Jeewoo Jung;Taeho Kim;Taekyoung Kwon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.5
    • /
    • pp.973-980
    • /
    • 2024
  • Fuzzing is an automated testing technique that generates a lot of testcases and monitors for exceptions to test a program. Recently, fuzzing research using machine learning has been actively proposed to solve various problems in the fuzzing process, but a comprehensive evaluation of fuzzing research using machine learning is lacking. In this paper, we analyze recent research that applies machine learning to scheduling techniques for fuzzing, categorizing them into reinforcement learning-based and supervised learning-based fuzzers. We evaluated the coverage performance of the analyzed machine learning-based fuzzers against real-world programs with four different file formats and bug detection performance against the LAVA-M dataset. The results showed that AFL-HIER, which applied seed clustering and seed scheduling with reinforcement learning outperformed in coverage and bug detection. In the case of supervised learning, it showed high coverage on tcpdumps with high code complexity, and its superior bug detection performance when applied to hybrid fuzzing. This research shows that performance of machine learning-based fuzzer is better when both machine learning and additional fuzzing techniques are used to optimize the fuzzing process. Future research is needed on practical and robust machine learning-based fuzzing techniques that can be effectively applied to programs that handle various input formats.

A Study on Development Environments for Machine Learning (머신러닝 자동화를 위한 개발 환경에 관한 연구)

  • Kim, Dong Gil;Park, Yong-Soon;Park, Lae-Jeong;Chung, Tae-Yun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.15 no.6
    • /
    • pp.307-316
    • /
    • 2020
  • Machine learning model data is highly affected by performance. preprocessing is needed to enable analysis of various types of data, such as letters, numbers, and special characters. This paper proposes a development environment that aims to process categorical and continuous data according to the type of missing values in stage 1, implementing the function of selecting the best performing algorithm in stage 2 and automating the process of checking model performance in stage 3. Using this model, machine learning models can be created without prior knowledge of data preprocessing.