• 제목/요약/키워드: metric learning

검색결과 128건 처리시간 0.035초

양방향 인재매칭을 위한 BERT 기반의 전이학습 모델 (A BERT-based Transfer Learning Model for Bidirectional HR Matching)

  • 오소진;장문경;송희석
    • Journal of Information Technology Applications and Management
    • /
    • 제28권4호
    • /
    • pp.33-43
    • /
    • 2021
  • While youth unemployment has recorded the lowest level since the global COVID-19 pandemic, SMEs(small and medium sized enterprises) are still struggling to fill vacancies. It is difficult for SMEs to find good candidates as well as for job seekers to find appropriate job offers due to information mismatch. To overcome information mismatch, this study proposes the fine-turning model for bidirectional HR matching based on a pre-learning language model called BERT(Bidirectional Encoder Representations from Transformers). The proposed model is capable to recommend job openings suitable for the applicant, or applicants appropriate for the job through sufficient pre-learning of terms including technical jargons. The results of the experiment demonstrate the superior performance of our model in terms of precision, recall, and f1-score compared to the existing content-based metric learning model. This study provides insights for developing practical models for job recommendations and offers suggestions for future research.

Research on data augmentation algorithm for time series based on deep learning

  • Shiyu Liu;Hongyan Qiao;Lianhong Yuan;Yuan Yuan;Jun Liu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권6호
    • /
    • pp.1530-1544
    • /
    • 2023
  • Data monitoring is an important foundation of modern science. In most cases, the monitoring data is time-series data, which has high application value. The deep learning algorithm has a strong nonlinear fitting capability, which enables the recognition of time series by capturing anomalous information in time series. At present, the research of time series recognition based on deep learning is especially important for data monitoring. Deep learning algorithms require a large amount of data for training. However, abnormal sample is a small sample in time series, which means the number of abnormal time series can seriously affect the accuracy of recognition algorithm because of class imbalance. In order to increase the number of abnormal sample, a data augmentation method called GANBATS (GAN-based Bi-LSTM and Attention for Time Series) is proposed. In GANBATS, Bi-LSTM is introduced to extract the timing features and then transfer features to the generator network of GANBATS.GANBATS also modifies the discriminator network by adding an attention mechanism to achieve global attention for time series. At the end of discriminator, GANBATS is adding averagepooling layer, which merges temporal features to boost the operational efficiency. In this paper, four time series datasets and five data augmentation algorithms are used for comparison experiments. The generated data are measured by PRD(Percent Root Mean Square Difference) and DTW(Dynamic Time Warping). The experimental results show that GANBATS reduces up to 26.22 in PRD metric and 9.45 in DTW metric. In addition, this paper uses different algorithms to reconstruct the datasets and compare them by classification accuracy. The classification accuracy is improved by 6.44%-12.96% on four time series datasets.

e-Learning 소프트웨어의 품질평가 모델 개발 (Development of e-Learning Software Quality Evaluation Model)

  • 이경철;이하용;양해술
    • 한국산학기술학회논문지
    • /
    • 제8권2호
    • /
    • pp.309-323
    • /
    • 2007
  • 최근 급격히 확산된 광역 인프라를 기반으로 탄생된 e-Learning 은 학교에서의 교육혁신 및 사회에서의 인적자원개발을 위한 새로운 수단으로 각광받고 있을 뿐만 아니라 디지털 콘텐츠 산업의 주요 핵심으로 등장하게 되었다. 본 논문에서는 e-Learning 소프트웨어의 기반 기술의 특성을 분석하고 e-Learning 소프트웨어의 품질시험 및 평가를 위한 품질특성을 분석하여 e-Learning 소프트웨어 품질 평가모델을 개발하였다. 이를 위해 관련 국제 표준을 도입하여 e-Learning 소프트웨어에 대한 품질평가 체계를 확립하고 품질평가를 위한 평가모델을 개발하였다. 이를 통해, 품질평가를 효과적으로 수행하여 경쟁력 있는 e-Learning소프트웨어 제품의 개발을 촉진할 수 있을 것이라고 사료된다.

  • PDF

Detecting outliers in segmented genomes of flu virus using an alignment-free approach

  • Daoud, Mosaab
    • Genomics & Informatics
    • /
    • 제18권1호
    • /
    • pp.2.1-2.11
    • /
    • 2020
  • In this paper, we propose a new approach to detecting outliers in a set of segmented genomes of the flu virus, a data set with a heterogeneous set of sequences. The approach has the following computational phases: feature extraction, which is a mapping into feature space, alignment-free distance measure to measure the distance between any two segmented genomes, and a mapping into distance space to analyze a quantum of distance values. The approach is implemented using supervised and unsupervised learning modes. The experiments show robustness in detecting outliers of the segmented genome of the flu virus.

Applications of machine learning methods in KMTNet data quality assurance and detecting microlensing events

  • Shin, Min-Su;Lee, Chung-Uk;Kim, Hyoun-Woo
    • 천문학회보
    • /
    • 제43권1호
    • /
    • pp.40.3-40.3
    • /
    • 2018
  • We present results from our two experiments of using machine learning algorithms in processing and analyzing the KMTNet imaging data. First, density estimation and clustering methods find meaningful structures in the metric space of imaging quality measurements described by photometric quantities. Second, we also develop a method to separate out light curves of reliable microlensing event candidates from spurious events, estimating reliability scores of the candidates.

  • PDF

A Study of Machine Learning based Face Recognition for User Authentication

  • Hong, Chung-Pyo
    • 반도체디스플레이기술학회지
    • /
    • 제19권2호
    • /
    • pp.96-99
    • /
    • 2020
  • According to brilliant development of smart devices, many related services are being devised. And, almost every service is designed to provide user-centric services based on personal information. In this situation, to prevent unintentional leakage of personal information is essential. Conventionally, ID and Password system is used for the user authentication. This is a convenient method, but it has a vulnerability that can cause problems due to information leakage. To overcome these problem, many methods related to face recognition is being researched. Through this paper, we investigated the trend of user authentication through biometrics and a representative model for face recognition techniques. One is DeepFace of FaceBook and another is FaceNet of Google. Each model is based on the concept of Deep Learning and Distance Metric Learning, respectively. And also, they are based on Convolutional Neural Network (CNN) model. In the future, further research is needed on the equipment configuration requirements for practical applications and ways to provide actual personalized services.

Avoiding collaborative paradox in multi-agent reinforcement learning

  • Kim, Hyunseok;Kim, Hyunseok;Lee, Donghun;Jang, Ingook
    • ETRI Journal
    • /
    • 제43권6호
    • /
    • pp.1004-1012
    • /
    • 2021
  • The collaboration productively interacting between multi-agents has become an emerging issue in real-world applications. In reinforcement learning, multi-agent environments present challenges beyond tractable issues in single-agent settings. This collaborative environment has the following highly complex attributes: sparse rewards for task completion, limited communications between each other, and only partial observations. In particular, adjustments in an agent's action policy result in a nonstationary environment from the other agent's perspective, which causes high variance in the learned policies and prevents the direct use of reinforcement learning approaches. Unexpected social loafing caused by high dispersion makes it difficult for all agents to succeed in collaborative tasks. Therefore, we address a paradox caused by the social loafing to significantly reduce total returns after a certain timestep of multi-agent reinforcement learning. We further demonstrate that the collaborative paradox in multi-agent environments can be avoided by our proposed effective early stop method leveraging a metric for social loafing.

이미지 분석을 위한 퓨샷 학습의 최신 연구동향 (Recent advances in few-shot learning for image domain: a survey)

  • 석호식
    • 전기전자학회논문지
    • /
    • 제27권4호
    • /
    • pp.537-547
    • /
    • 2023
  • 퓨삿학습(few-shot learning)은 사전에 확보한 관련 지식과 소규모의 학습데이터를 이용하여 학습데이터의 부족으로 인한 어려움을 해결할 수 있는 가능성을 제시해주어 최근 많은 주목을 받고 있다. 본 논문에서는 퓨삿학습의 개념과 주요 접근방법을 빠르게 파악할 수 있도록 데이터 증강, 임베딩과 측도학습, 메타학습의 세 관점에서 최신연구동향을 설명한다. 또한 퓨샷학습을 적용하려는 연구자들에게 도움을 제공할 수 있도록 주요 벤치마크 데이터셋에 대하여 간략하게 소개하였다. 퓨삿학습은 이미지 분석과 자연어 처리 등 다양한 분야에서 활용되고 있으나, 본 논문은 이미지 처리를 위한 퓨삿학습의 접근법에 집중하였다.

계층별 메트릭 생성을 이용한 계층적 Gaussian ARTMAP의 설계 (A Design of Hierarchical Gaussian ARTMAP using Different Metric Generation for Each Level)

  • 최태훈;임성길;이현수
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제36권8호
    • /
    • pp.633-641
    • /
    • 2009
  • 본 논문에서는 아날로그 데이터 처리가 가능하고, 온라인 학습, 학습 중 새로운 클래스 추가등의 특징을 가진 패턴 인식기를 제안하였다. 제안한 패턴 인식기는 계층적 구조를 가지고 있으며, 각 레벨별로 서로 다른 메트릭을 적용하여 분류 성능을 향상 시켰다. 제안한 패턴 인식기는 신경망 기반의 패턴 인식 알고리즘인 Gaussian ARTMAP 모델을 기반으로 하고 있다. Gaussian ARTMAP 모델을 계층적으로 구성하고, 계층마다 서로 다른 특징을 학습하도록 하기 위하여 Principal Component Emphasis (P.C.E) 방법을 제안하였으며, 이를 이용하여 새로운 메트릭을 생성하는 방법을 제안하였다. P.C.E는 학습된 입력 데이터들의 분산을 이용하여 클래스 내의 공통 속성을 나타내는 분산이 작은 차원을 제거하고 패턴 간의 서로 다른 속성을 나타내는 분산이 큰 차원만 유지하는 방법이다. 제안한 알고리즘의 학습 과정에서 교사 신호와 다르게 분류된 패턴이 발생하면 잘못 분류 된 클래스와 입력된 패턴을 분리하기 위하여 P.C.E를 수행하고 하위 노드에서 학습하게 된다. 실험 결과 제안한 모델은 기존에 제안된 패턴 인식 모델들 보다 높은 분류 성능을 가지고 있음을 확인하였다.

The extension of the largest generalized-eigenvalue based distance metric Dij1) in arbitrary feature spaces to classify composite data points

  • Daoud, Mosaab
    • Genomics & Informatics
    • /
    • 제17권4호
    • /
    • pp.39.1-39.20
    • /
    • 2019
  • Analyzing patterns in data points embedded in linear and non-linear feature spaces is considered as one of the common research problems among different research areas, for example: data mining, machine learning, pattern recognition, and multivariate analysis. In this paper, data points are heterogeneous sets of biosequences (composite data points). A composite data point is a set of ordinary data points (e.g., set of feature vectors). We theoretically extend the derivation of the largest generalized eigenvalue-based distance metric Dij1) in any linear and non-linear feature spaces. We prove that Dij1) is a metric under any linear and non-linear feature transformation function. We show the sufficiency and efficiency of using the decision rule $\bar{{\delta}}_{{\Xi}i}$(i.e., mean of Dij1)) in classification of heterogeneous sets of biosequences compared with the decision rules min𝚵iand median𝚵i. We analyze the impact of linear and non-linear transformation functions on classifying/clustering collections of heterogeneous sets of biosequences. The impact of the length of a sequence in a heterogeneous sequence-set generated by simulation on the classification and clustering results in linear and non-linear feature spaces is empirically shown in this paper. We propose a new concept: the limiting dispersion map of the existing clusters in heterogeneous sets of biosequences embedded in linear and nonlinear feature spaces, which is based on the limiting distribution of nucleotide compositions estimated from real data sets. Finally, the empirical conclusions and the scientific evidences are deduced from the experiments to support the theoretical side stated in this paper.