• 제목/요약/키워드: Unlabeled

검색결과 154건 처리시간 0.025초

Domain Adaptation for Opinion Classification: A Self-Training Approach

  • Yu, Ning
    • Journal of Information Science Theory and Practice
    • /
    • 제1권1호
    • /
    • pp.10-26
    • /
    • 2013
  • Domain transfer is a widely recognized problem for machine learning algorithms because models built upon one data domain generally do not perform well in another data domain. This is especially a challenge for tasks such as opinion classification, which often has to deal with insufficient quantities of labeled data. This study investigates the feasibility of self-training in dealing with the domain transfer problem in opinion classification via leveraging labeled data in non-target data domain(s) and unlabeled data in the target-domain. Specifically, self-training is evaluated for effectiveness in sparse data situations and feasibility for domain adaptation in opinion classification. Three types of Web content are tested: edited news articles, semi-structured movie reviews, and the informal and unstructured content of the blogosphere. Findings of this study suggest that, when there are limited labeled data, self-training is a promising approach for opinion classification, although the contributions vary across data domains. Significant improvement was demonstrated for the most challenging data domain-the blogosphere-when a domain transfer-based self-training strategy was implemented.

Biotransformation of Tranylcypromine in Rat Liver Microsomes

  • Kang, Gun-Il;Hong, Suk-Kil
    • Archives of Pharmacal Research
    • /
    • 제11권4호
    • /
    • pp.292-300
    • /
    • 1988
  • Metabolism of tranylcypromine (TCP) in rat liver microsomes was studied in vitro using fortified microsomal preparations. As well as unlabeled TCP, two deuterium labeled analogs, TCP-phenyl-$d_{5}$ and TCP-cyclopropyl-$d_{2}$ were used and GC/MS employed which was then metabolized to cinnamaldehyde and hydrocinnamyl alcohol. Schiff bases of TCP with hydrocinnamaldehyde and acetaldehyde were detected and possibility of the metabolic formation of N-ethylidene TCP was proposed. In addition, acetophenone (benzoylacetic acid), benzaldehyde, benzoic acid, and benzyl alcohol were detected as the metabolites. Chemical decomposition studies suggested that parts of the oxidized products might be derived by air oxidation processes. A potential metabolite assumed to be N-ethylidene-1, 2-dihydroxy-3-phenylpropanamine oxide was also detected.

  • PDF

SVM-KNN-AdaBoost를 적용한 새로운 중간교사학습 방법 (Semisupervised Learning Using the AdaBoost Algorithm with SVM-KNN)

  • 이상민;연준상;김지수;김성수
    • 전기학회논문지
    • /
    • 제61권9호
    • /
    • pp.1336-1339
    • /
    • 2012
  • In this paper, we focus on solving the classification problem by using semisupervised learning strategy. Traditional classifiers are constructed based on labeled data in supervised learning. Labeled data, however, are often difficult, expensive or time consuming to obtain, as they require the efforts of experienced human annotators. Unlabeled data are significantly easier to obtain without human efforts. Thus, we use AdaBoost algorithm with SVM-KNN classifier to apply semisupervised learning problem and improve the classifier performance. Experimental results on both artificial and UCI data sets show that the proposed methodology can reduce the error rate.

웹페이지에서 레이블이 없는 텍스트 인식을 위한 확률 모델 (A Probabilistic Method for Recognizing Unlabeled Text on Web Pages)

  • 정창후;이민호;주원균;맹성현
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2003년도 가을 학술발표논문집 Vol.30 No.2 (1)
    • /
    • pp.163-165
    • /
    • 2003
  • 도메인 지식은 텍스트의 포맷과 의미 정보를 이용하여 웹에 존재하는 텍스트의 다양한 의미를 이해할 수 있도록 도와준다. 그러나 도메인 지식은 텍스트에 데이터의 의미를 표현하는 레이블이 존재하지 알을 경우에 텍스트 인식을 제대로 수행할 수 없기 때문에 무용지물이 되고 만다. 이러한 문제를 해결하기 위해 본 논문에서는 레이블이 존재하지 않는 텍스트의 의미를 효과적으로 추론할 수 있는 엔티티 인식 모델을 제안한다 엔티티 인식 모델은 베이지언 모델과 컨텍스트 정보를 결합한 방법으로서, 구조 분석을 수행한 HTML 문서의 텍스트 토큰에 대해서 어떤 엔티티에 속할 것인가를 결정하는 기능을 수행한다. 실험 결과 본 모델을 사용할 경우 기존에는 레이블이 없어서 인식되지 않았던 텍스트들을 효과적으로 인식하는 것을 확인할 수 있었다.

  • PDF

Background Subtraction for Moving Cameras based on trajectory-controlled segmentation and Label Inference

  • Yin, Xiaoqing;Wang, Bin;Li, Weili;Liu, Yu;Zhang, Maojun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제9권10호
    • /
    • pp.4092-4107
    • /
    • 2015
  • We propose a background subtraction method for moving cameras based on trajectory classification, image segmentation and label inference. In the trajectory classification process, PCA-based outlier detection strategy is used to remove the outliers in the foreground trajectories. Combining optical flow trajectory with watershed algorithm, we propose a trajectory-controlled watershed segmentation algorithm which effectively improves the edge-preserving performance and prevents the over-smooth problem. Finally, label inference based on Markov Random field is conducted for labeling the unlabeled pixels. Experimental results on the motionseg database demonstrate the promising performance of the proposed approach compared with other competing methods.

A Novel Text to Image Conversion Method Using Word2Vec and Generative Adversarial Networks

  • LIU, XINRUI;Joe, Inwhee
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2019년도 춘계학술발표대회
    • /
    • pp.401-403
    • /
    • 2019
  • In this paper, we propose a generative adversarial networks (GAN) based text-to-image generating method. In many natural language processing tasks, which word expressions are determined by their term frequency -inverse document frequency scores. Word2Vec is a type of neural network model that, in the case of an unlabeled corpus, produces a vector that expresses semantics for words in the corpus and an image is generated by GAN training according to the obtained vector. Thanks to the understanding of the word we can generate higher and more realistic images. Our GAN structure is based on deep convolution neural networks and pixel recurrent neural networks. Comparing the generated image with the real image, we get about 88% similarity on the Oxford-102 flowers dataset.

Machine learning-based categorization of source terms for risk assessment of nuclear power plants

  • Jin, Kyungho;Cho, Jaehyun;Kim, Sung-yeop
    • Nuclear Engineering and Technology
    • /
    • 제54권9호
    • /
    • pp.3336-3346
    • /
    • 2022
  • In general, a number of severe accident scenarios derived from Level 2 probabilistic safety assessment (PSA) are typically grouped into several categories to efficiently evaluate their potential impacts on the public with the assumption that scenarios within the same group have similar source term characteristics. To date, however, grouping by similar source terms has been completely reliant on qualitative methods such as logical trees or expert judgements. Recently, an exhaustive simulation approach has been developed to provide quantitative information on the source terms of a large number of severe accident scenarios. With this motivation, this paper proposes a machine learning-based categorization method based on exhaustive simulation for grouping scenarios with similar accident consequences. The proposed method employs clustering with an autoencoder for grouping unlabeled scenarios after dimensionality reductions and feature extractions from the source term data. To validate the suggested method, source term data for 658 severe accident scenarios were used. Results confirmed that the proposed method successfully characterized the severe accident scenarios with similar behavior more precisely than the conventional grouping method.

점진적 데이터 평준화를 이용한 반도체 웨이퍼 영상 내 결함 패턴 분류 (Wafer Map Defect Pattern Classification with Progressive Pseudo-Labeling Balancing)

  • 도정혁;김문철
    • 한국방송∙미디어공학회:학술대회논문집
    • /
    • 한국방송∙미디어공학회 2020년도 추계학술대회
    • /
    • pp.248-251
    • /
    • 2020
  • 전 반도체 제조 및 검사 공정 과정을 자동화하는 스마트 팩토리의 실현에 있어 제품 검수를 위한 검사 장비는 필수적이다. 하지만 딥 러닝 모델 학습을 위한 데이터 처리 과정에서 엔지니어가 전체 웨이퍼 영상에 대하여 결함 항목 라벨을 매칭하는 것은 현실적으로 불가능하기 때문에 소량의 라벨 (labeled) 데이터와 나머지 라벨이 없는 (unlabeled) 데이터를 적절히 활용해야 한다. 또한, 웨이퍼 영상에서 결함이 발생하는 빈도가 결함 종류별로 크게 차이가 나기 때문에 빈도가 적은 (minor) 결함은 잡음처럼 취급되어 올바른 분류가 되지 않는다. 본 논문에서는 소량의 라벨 데이터와 대량의 라벨이 없는 데이터를 동시에 활용하면서 결함 사이의 발생 빈도 불균등 문제를 해결하는 점진적 데이터 평준화 (progressive pseudo-labeling balancer)를 제안한다. 점진적 데이터 평준화를 이용해 분류 네트워크를 학습시키는 경우, 기존의 테스트 정확도인 71.19%에서 6.07%-p 상승한 77.26%로 약 40%의 라벨 데이터가 추가된 것과 같은 성능을 보였다.

  • PDF

문형 정보를 이용한 규칙 기반 한국어 의존구문분석 (Rules-based Korean Dependency Parsing using Sentence Pattern Informations.)

  • 김성태;김민호;김현아;권혁철
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2019년도 제31회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.139-143
    • /
    • 2019
  • 본 논문에서 제안하는 구문분석기는 품사 태거를 사용하지 않고 문장에서 나오는 모든 형태소 분석 후보에 의존관계를 부여하는 광범위 의존구문분석기이다. 중의성이 발생할 수 있는 문장에 대해 나오는 모든 후보 구문분석 트리를 출력하며 규칙을 통해 순위화를 진행한다. 또한 문형 정보 말뭉치의 적절한 사용을 위해 이전 연구의 한계점을 극복한 규칙과 알고리즘을 구현하고 문형 정보를 통해 후보 구문분석 트리의 순위화를 강화하였다. 뿐만 아니라 순위화가 어려운 [명사-관형사구] 자질에 대해 문형 정보를 사용하여 순위화를 강화하였다. 그 결과, 1순위의 구문 분석 트리에 대한 UAS(Unlabeled Attachment Score)가 0.52% 향상되었고, 후보트리에 대한 평균 정답 순위는 12.2%의 성능향상을 보였다.

  • PDF

AI-based language tutoring systems with end-to-end automatic speech recognition and proficiency evaluation

  • Byung Ok Kang;Hyung-Bae Jeon;Yun Kyung Lee
    • ETRI Journal
    • /
    • 제46권1호
    • /
    • pp.48-58
    • /
    • 2024
  • This paper presents the development of language tutoring systems for nonnative speakers by leveraging advanced end-to-end automatic speech recognition (ASR) and proficiency evaluation. Given the frequent errors in non-native speech, high-performance spontaneous speech recognition must be applied. Our systems accurately evaluate pronunciation and speaking fluency and provide feedback on errors by relying on precise transcriptions. End-to-end ASR is implemented and enhanced by using diverse non-native speaker speech data for model training. For performance enhancement, we combine semisupervised and transfer learning techniques using labeled and unlabeled speech data. Automatic proficiency evaluation is performed by a model trained to maximize the statistical correlation between the fluency score manually determined by a human expert and a calculated fluency score. We developed an English tutoring system for Korean elementary students called EBS AI Peng-Talk and a Korean tutoring system for foreigners called KSI Korean AI Tutor. Both systems were deployed by South Korean government agencies.