• Title/Summary/Keyword: Labeled Data

Search Result 469, Processing Time 0.028 seconds

Multiple Classifier System for Activity Recognition

  • Han, Yong-Koo;Lee, Sung-Young;Lee, young-Koo;Lee, Jae-Won
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2007.11a
    • /
    • pp.439-443
    • /
    • 2007
  • Nowadays, activity recognition becomes a hot topic in context-aware computing. In activity recognition, machine learning techniques have been widely applied to learn the activity models from labeled activity samples. Most of the existing work uses only one learning method for activity learning and is focused on how to effectively utilize the labeled samples by refining the learning method. However, not much attention has been paid to the use of multiple classifiers for boosting the learning performance. In this paper, we use two methods to generate multiple classifiers. In the first method, the basic learning algorithms for each classifier are the same, while the training data is different (ASTD). In the second method, the basic learning algorithms for each classifier are different, while the training data is the same (ADTS). Experimental results indicate that ADTS can effectively improve activity recognition performance, while ASTD cannot achieve any improvement of the performance. We believe that the classifiers in ADTS are more diverse than those in ASTD.

  • PDF

The Analysis of Semi-supervised Learning Technique of Deep Learning-based Classification Model (딥러닝 기반 분류 모델의 준 지도 학습 기법 분석)

  • Park, Jae Hyeon;Cho, Sung In
    • Journal of Broadcast Engineering
    • /
    • v.26 no.1
    • /
    • pp.79-87
    • /
    • 2021
  • In this paper, we analysis the semi-supervised learning (SSL), which is adopted in order to train a deep learning-based classification model using the small number of labeled data. The conventional SSL techniques can be categorized into consistency regularization, entropy-based, and pseudo labeling. First, we describe the algorithm of each SSL technique. In the experimental results, we evaluate the classification accuracy of each SSL technique varying the number of labeled data. Finally, based on the experimental results, we describe the limitations of SSL technique, and suggest the research direction to improve the classification performance of SSL.

Burmese Sentiment Analysis Based on Transfer Learning

  • Mao, Cunli;Man, Zhibo;Yu, Zhengtao;Wu, Xia;Liang, Haoyuan
    • Journal of Information Processing Systems
    • /
    • v.18 no.4
    • /
    • pp.535-548
    • /
    • 2022
  • Using a rich resource language to classify sentiments in a language with few resources is a popular subject of research in natural language processing. Burmese is a low-resource language. In light of the scarcity of labeled training data for sentiment classification in Burmese, in this study, we propose a method of transfer learning for sentiment analysis of a language that uses the feature transfer technique on sentiments in English. This method generates a cross-language word-embedding representation of Burmese vocabulary to map Burmese text to the semantic space of English text. A model to classify sentiments in English is then pre-trained using a convolutional neural network and an attention mechanism, where the network shares the model for sentiment analysis of English. The parameters of the network layer are used to learn the cross-language features of the sentiments, which are then transferred to the model to classify sentiments in Burmese. Finally, the model was tuned using the labeled Burmese data. The results of the experiments show that the proposed method can significantly improve the classification of sentiments in Burmese compared to a model trained using only a Burmese corpus.

Using physical activity levels to estimate energy requirements of female athletes

  • Park, Jonghoon
    • Korean Journal of Exercise Nutrition
    • /
    • v.23 no.4
    • /
    • pp.1-5
    • /
    • 2019
  • [Purpose] The goal of this study was to review data on physical activity level (PAL), a crucial index for determining estimated energy requirement (EER), calculated as total energy expenditure (TEE, assessed with doubly labeled water [DLW]) divided by resting metabolic rate (RMR, PAL = TEE/RMR) in female athletes and to understand the methods of assessing athletes' EERs in the field. [Methods] For the PAL data review among female athletes, we conducted a PubMed search of the available literature related to the DLW method. DLW studies measuring TEE and RMR were included for the present review. [Results] Briefly, the mean PAL was 1.71 for collegiate swimmers with moderate training, which was relatively low, but the mean PAL was 3.0 for elite swimmers during summer training camp. This shows that PAL can largely vary even within the same sport depending on the amount of training, and the differences in PAL were remarkable depending on the sport. Aside from the DLW method, there is currently no research tool related to athletes' EERs that can be used in the field. [Conclusion] Briefly, the mean PAL was 1.71 for collegiate swimmers with moderate training, which was relatively low, but the mean PAL was 3.0 for elite swimmers during summer training camp. This shows that PAL can largely vary even within the same sport depending on the amount of training, and the differences in PAL were remarkable depending on the sport. Aside from the DLW method, there is currently no research tool related to athletes' EERs that can be used in the field.

Medical Image Analysis Using Artificial Intelligence

  • Yoon, Hyun Jin;Jeong, Young Jin;Kang, Hyun;Jeong, Ji Eun;Kang, Do-Young
    • Progress in Medical Physics
    • /
    • v.30 no.2
    • /
    • pp.49-58
    • /
    • 2019
  • Purpose: Automated analytical systems have begun to emerge as a database system that enables the scanning of medical images to be performed on computers and the construction of big data. Deep-learning artificial intelligence (AI) architectures have been developed and applied to medical images, making high-precision diagnosis possible. Materials and Methods: For diagnosis, the medical images need to be labeled and standardized. After pre-processing the data and entering them into the deep-learning architecture, the final diagnosis results can be obtained quickly and accurately. To solve the problem of overfitting because of an insufficient amount of labeled data, data augmentation is performed through rotation, using left and right flips to artificially increase the amount of data. Because various deep-learning architectures have been developed and publicized over the past few years, the results of the diagnosis can be obtained by entering a medical image. Results: Classification and regression are performed by a supervised machine-learning method and clustering and generation are performed by an unsupervised machine-learning method. When the convolutional neural network (CNN) method is applied to the deep-learning layer, feature extraction can be used to classify diseases very efficiently and thus to diagnose various diseases. Conclusions: AI, using a deep-learning architecture, has expertise in medical image analysis of the nerves, retina, lungs, digital pathology, breast, heart, abdomen, and musculo-skeletal system.

An Active Co-Training Algorithm for Biomedical Named-Entity Recognition

  • Munkhdalai, Tsendsuren;Li, Meijing;Yun, Unil;Namsrai, Oyun-Erdene;Ryu, Keun Ho
    • Journal of Information Processing Systems
    • /
    • v.8 no.4
    • /
    • pp.575-588
    • /
    • 2012
  • Exploiting unlabeled text data with a relatively small labeled corpus has been an active and challenging research topic in text mining, due to the recent growth of the amount of biomedical literature. Biomedical named-entity recognition is an essential prerequisite task before effective text mining of biomedical literature can begin. This paper proposes an Active Co-Training (ACT) algorithm for biomedical named-entity recognition. ACT is a semi-supervised learning method in which two classifiers based on two different feature sets iteratively learn from informative examples that have been queried from the unlabeled data. We design a new classification problem to measure the informativeness of an example in unlabeled data. In this classification problem, the examples are classified based on a joint view of a feature set to be informative/non-informative to both classifiers. To form the training data for the classification problem, we adopt a query-by-committee method. Therefore, in the ACT, both classifiers are considered to be one committee, which is used on the labeled data to give the informativeness label to each example. The ACT method outperforms the traditional co-training algorithm in terms of f-measure as well as the number of training iterations performed to build a good classification model. The proposed method tends to efficiently exploit a large amount of unlabeled data by selecting a small number of examples having not only useful information but also a comprehensive pattern.

CALS: Channel State Information Auto-Labeling System for Large-scale Deep Learning-based Wi-Fi Sensing (딥러닝 기반 Wi-Fi 센싱 시스템의 효율적인 구축을 위한 지능형 데이터 수집 기법)

  • Jang, Jung-Ik;Choi, Jaehyuk
    • Journal of IKEEE
    • /
    • v.26 no.3
    • /
    • pp.341-348
    • /
    • 2022
  • Wi-Fi Sensing, which uses Wi-Fi technology to sense the surrounding environments, has strong potentials in a variety of sensing applications. Recently several advanced deep learning-based solutions using CSI (Channel State Information) data have achieved high performance, but it is still difficult to use in practice without explicit data collection, which requires expensive adaptation efforts for model retraining. In this study, we propose a Channel State Information Automatic Labeling System (CALS) that automatically collects and labels training CSI data for deep learning-based Wi-Fi sensing systems. The proposed system allows the CSI data collection process to efficiently collect labeled CSI for labeling for supervised learning using computer vision technologies such as object detection algorithms. We built a prototype of CALS to demonstrate its efficiency and collected data to train deep learning models for detecting the presence of a person in an indoor environment, showing to achieve an accuracy of over 90% with the auto-labeled data sets generated by CALS.

Design of Knowledge-based Spatial Querying System Using Labeled Property Graph and GraphQL (속성 그래프 및 GraphQL을 활용한 지식기반 공간 쿼리 시스템 설계)

  • Jang, Hanme;Kim, Dong Hyeon;Yu, Kiyun
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.40 no.5
    • /
    • pp.429-437
    • /
    • 2022
  • Recently, the demand for a QA (Question Answering) system for human-machine communication has increased. Among the QA systems, a closed domain QA system that can handle spatial-related questions is called GeoQA. In this study, a new type of graph database, LPG (Labeled Property Graph) was used to overcome the limitations of the RDF (Resource Description Framework) based database, which was mainly used in the GeoQA field. In addition, GraphQL (Graph Query Language), an API-type query language, is introduced to address the fact that the LPG query language is not standardized and the GeoQA system may depend on specific products. In this study, database was built so that answers could be retrieved when spatial-related questions were entered. Each data was obtained from the national spatial information portal and local data open service. The spatial relationships between each spatial objects were calculated in advance and stored in edge form. The user's questions were first converted to GraphQL through FOL (First Order Logic) format and delivered to the database through the GraphQL server. The LPG used in the experiment is Neo4j, the graph database that currently has the highest market share, and some of the built-in functions and QGIS were used for spatial calculations. As a result of building the system, it was confirmed that the user's question could be transformed, processed through the Apollo GraphQL server, and an appropriate answer could be obtained from the database.

Semisupervised support vector quantile regression

  • Seok, Kyungha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.2
    • /
    • pp.517-524
    • /
    • 2015
  • Unlabeled examples are easier and less expensive to be obtained than labeled examples. In this paper semisupervised approach is used to utilize such examples in an effort to enhance the predictive performance of nonlinear quantile regression problems. We propose a semisupervised quantile regression method named semisupervised support vector quantile regression, which is based on support vector machine. A generalized approximate cross validation method is used to choose the hyper-parameters that affect the performance of estimator. The experimental results confirm the successful performance of the proposed S2SVQR.

Research on Deep Learning Performance Improvement for Similar Image Classification (유사 이미지 분류를 위한 딥 러닝 성능 향상 기법 연구)

  • Lim, Dong-Jin;Kim, Taehong
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.8
    • /
    • pp.1-9
    • /
    • 2021
  • Deep learning in computer vision has made accelerated improvement over a short period but large-scale learning data and computing power are still essential that required time-consuming trial and error tasks are involved to derive an optimal network model. In this study, we propose a similar image classification performance improvement method based on CR (Confusion Rate) that considers only the characteristics of the data itself regardless of network optimization or data reinforcement. The proposed method is a technique that improves the performance of the deep learning model by calculating the CRs for images in a dataset with similar characteristics and reflecting it in the weight of the Loss Function. Also, the CR-based recognition method is advantageous for image identification with high similarity because it enables image recognition in consideration of similarity between classes. As a result of applying the proposed method to the Resnet18 model, it showed a performance improvement of 0.22% in HanDB and 3.38% in Animal-10N. The proposed method is expected to be the basis for artificial intelligence research using noisy labeled data accompanying large-scale learning data.