• Title/Summary/Keyword: 임베딩벡터

검색결과 142건 처리시간 0.028초

A study on classification of hooking headlines using deep learning techniques (딥러닝 기법을 이용한 낚시성 기사 제목 분류에 대한 연구)

  • Choi, Yong-Seok;Choi, Han-Na;Shin, Ji-Hye;Jeong, Chang-Min;An, Jung-Yeon;Yoo, Chae-Young;Im, Chae-Eun;Lee, Kong-Joo
    • Annual Conference on Human and Language Technology
    • /
    • 한국정보과학회언어공학연구회 2015년도 제27회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.15-17
    • /
    • 2015
  • 본 논문은 낚시성 기사 제목과 비낚시성 기사 제목을 판별하기 위한 시스템을 제시한다. 서포트 벡터 머신(SVM)을 이용하여 기사 제목을 분류하며, 분류하는 기준은 딥러닝 기법중의 하나인 워드임베딩(Word Embedding), 군집화 알고리즘 중 하나인 K 평균 알고리즘(K-means)을 이용한다. 자질로서 기사 제목의 단어를 사용하였으며, 정확도가 83.78%이다. 결론적으로 낚시성 기사 제목에는 낚시를 유도하는 특별한 단어들이 존재함을 알 수 있다.

  • PDF

Automatic Attendance Check System Using Face Recognition In A Masked Environment (마스크를 착용한 환경에서 얼굴 인식을 활용한 자동 출석체크 시스템)

  • Kim, Young-Kook;Lim, Chae-Hyun;Son, Min-Ji;Kim, Myung-Ho
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 한국컴퓨터정보학회 2020년도 제62차 하계학술대회논문집 28권2호
    • /
    • pp.23-26
    • /
    • 2020
  • 본 논문에서는 CCTV를 통해 얻은 영상에서 얼굴을 인식하여 자동으로 출석 여부를 체크하는 시스템을 소개한다. 이 시스템은 CNN을 바탕으로 RetinaFace 모델을 사용하여 얼굴을 탐지하고, 탐지된 얼굴을 ArcFace 모델로 R512의 목표 공간으로 임베딩한다. 기존 데이터베이스에 등록된 얼굴과 CCTV를 통해 얻은 얼굴들의 임베딩 벡터 사이의 Angular Cosine Distance를 측정하여 동일 인물인지 판단하는 매칭 알고리즘을 제안한다. 실험을 통해 두 모델을 동시에 사용할 최적의 환경을 파악하고, 마스크 착용으로 얼굴의 하단부가 가려지는 폐색 문제에 더욱 효과적으로 대응하여 매칭 성능을 높이는 방법을 제안한다.

  • PDF

Standard Industrial Classification in Short Sentence Based on Machine Learning Approach (기계학습 기반 단문에서의 문장 분류 방법을 이용한 한국표준산업분류)

  • Oh, Kyo-Joong;Choi, Ho-Jin;An, Hweongak
    • Annual Conference on Human and Language Technology
    • /
    • 한국정보과학회언어공학연구회 2020년도 제32회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.394-398
    • /
    • 2020
  • 산업/직업분류 자동코딩시스템은 고용조사 등을 함에 있어 사업체 정보, 업무, 직급, 부서명 등 사용자의 다양한 입력을 표준 산업/직업분류에 맞춰 코드 정보를 제공해주는 시스템이다. 입력 데이터로부터 비지도학습 기반의 색인어 추출 모델을 학습하고, 부분단어 임베딩이 적용된 색인어 임베딩 모델을 통해 입력 벡터를 추출 후, 출력 분류 코드를 인코딩하여 지도학습 모델에서 학습하는 방법을 적용하였다. 기존 시스템의 분류 결과 데이터를 통해 대, 중, 소, 세분류에서 높은 정확도의 모델을 구축할 수 있으며, 기계학습 기술의 적용이 가능한 시스템임을 알 수 있다.

  • PDF

The Design of Technical Interview System for Computer Engineering based Similarity (유사도 기반 컴퓨터공학 기술 면접 시스템의 설계)

  • Dong Hyun Lee;Dong Hyun Kim
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 한국컴퓨터정보학회 2023년도 제68차 하계학술대회논문집 31권2호
    • /
    • pp.351-352
    • /
    • 2023
  • 컴퓨터공학 분야 개발자를 채용할 때 대다수의 기업에서 일반 면접과는 달리 전공 분야 역량 파악을 위한 컴퓨터공학 기술 면접을 함께 진행한다. 컴퓨터공학 면접자의 기술 면접을 지원하기 위하여 이 논문에서는 컴퓨터공학 핵심 개념에 대한 면접자 답변의 정확도를 코사인 유사도를 이용하여 평가 후 결과를 알려주는 시스템을 제안한다. 제안한 시스템을 이용하면 개발자들의 컴퓨터공학 핵심 개념의 기술 면접 정확도를 향상시킬 수 있을 것으로 기대된다.

  • PDF

Major Class Recommendation System based on Deep learning using Network Analysis (네트워크 분석을 활용한 딥러닝 기반 전공과목 추천 시스템)

  • Lee, Jae Kyu;Park, Heesung;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • 제27권3호
    • /
    • pp.95-112
    • /
    • 2021
  • In university education, the choice of major class plays an important role in students' careers. However, in line with the changes in the industry, the fields of major subjects by department are diversifying and increasing in number in university education. As a result, students have difficulty to choose and take classes according to their career paths. In general, students choose classes based on experiences such as choices of peers or advice from seniors. This has the advantage of being able to take into account the general situation, but it does not reflect individual tendencies and considerations of existing courses, and has a problem that leads to information inequality that is shared only among specific students. In addition, as non-face-to-face classes have recently been conducted and exchanges between students have decreased, even experience-based decisions have not been made as well. Therefore, this study proposes a recommendation system model that can recommend college major classes suitable for individual characteristics based on data rather than experience. The recommendation system recommends information and content (music, movies, books, images, etc.) that a specific user may be interested in. It is already widely used in services where it is important to consider individual tendencies such as YouTube and Facebook, and you can experience it familiarly in providing personalized services in content services such as over-the-top media services (OTT). Classes are also a kind of content consumption in terms of selecting classes suitable for individuals from a set content list. However, unlike other content consumption, it is characterized by a large influence of selection results. For example, in the case of music and movies, it is usually consumed once and the time required to consume content is short. Therefore, the importance of each item is relatively low, and there is no deep concern in selecting. Major classes usually have a long consumption time because they have to be taken for one semester, and each item has a high importance and requires greater caution in choice because it affects many things such as career and graduation requirements depending on the composition of the selected classes. Depending on the unique characteristics of these major classes, the recommendation system in the education field supports decision-making that reflects individual characteristics that are meaningful and cannot be reflected in experience-based decision-making, even though it has a relatively small number of item ranges. This study aims to realize personalized education and enhance students' educational satisfaction by presenting a recommendation model for university major class. In the model study, class history data of undergraduate students at University from 2015 to 2017 were used, and students and their major names were used as metadata. The class history data is implicit feedback data that only indicates whether content is consumed, not reflecting preferences for classes. Therefore, when we derive embedding vectors that characterize students and classes, their expressive power is low. With these issues in mind, this study proposes a Net-NeuMF model that generates vectors of students, classes through network analysis and utilizes them as input values of the model. The model was based on the structure of NeuMF using one-hot vectors, a representative model using data with implicit feedback. The input vectors of the model are generated to represent the characteristic of students and classes through network analysis. To generate a vector representing a student, each student is set to a node and the edge is designed to connect with a weight if the two students take the same class. Similarly, to generate a vector representing the class, each class was set as a node, and the edge connected if any students had taken the classes in common. Thus, we utilize Node2Vec, a representation learning methodology that quantifies the characteristics of each node. For the evaluation of the model, we used four indicators that are mainly utilized by recommendation systems, and experiments were conducted on three different dimensions to analyze the impact of embedding dimensions on the model. The results show better performance on evaluation metrics regardless of dimension than when using one-hot vectors in existing NeuMF structures. Thus, this work contributes to a network of students (users) and classes (items) to increase expressiveness over existing one-hot embeddings, to match the characteristics of each structure that constitutes the model, and to show better performance on various kinds of evaluation metrics compared to existing methodologies.

Performance Comparison of Automatic Classification Using Word Embeddings of Book Titles (단행본 서명의 단어 임베딩에 따른 자동분류의 성능 비교)

  • Yong-Gu Lee
    • Journal of the Korean Society for information Management
    • /
    • 제40권4호
    • /
    • pp.307-327
    • /
    • 2023
  • To analyze the impact of word embedding on book titles, this study utilized word embedding models (Word2vec, GloVe, fastText) to generate embedding vectors from book titles. These vectors were then used as classification features for automatic classification. The classifier utilized the k-nearest neighbors (kNN) algorithm, with the categories for automatic classification based on the DDC (Dewey Decimal Classification) main class 300 assigned by libraries to books. In the automatic classification experiment applying word embeddings to book titles, the Skip-gram architectures of Word2vec and fastText showed better results in the automatic classification performance of the kNN classifier compared to the TF-IDF features. In the optimization of various hyperparameters across the three models, the Skip-gram architecture of the fastText model demonstrated overall good performance. Specifically, better performance was observed when using hierarchical softmax and larger embedding dimensions as hyperparameters in this model. From a performance perspective, fastText can generate embeddings for substrings or subwords using the n-gram method, which has been shown to increase recall. The Skip-gram architecture of the Word2vec model generally showed good performance at low dimensions(size 300) and with small sizes of negative sampling (3 or 5).

Image Warping Using Vector Field Based Deformation and Its Application to Texture Mapping (벡터장 기반 변형기술을 이용한 이미지 와핑 방법 : 텍스쳐 매핑에의 응용을 중심으로)

  • Seo, Hye-Won;Cordier, Frederic
    • Journal of KIISE:Computer Systems and Theory
    • /
    • 제36권5호
    • /
    • pp.404-411
    • /
    • 2009
  • We introduce in this paper a new method for smooth foldover-free warping of images, based on the vector field deformation technique proposed by Von Funck et al. It allows users to specify the constraints in two different ways: positional constraints to constrain the position of a point in the image and gradient constraints to constrain the orientation and scaling of some parts of the image. From the user-specified constraints, it computes in the image domain a C1-continuous velocity vector field, along which each pixel progressively moves from its original position to the target. The target positions of the pixels are obtained by solving a set of partial derivative equations with the 4th order Runge-Kutta method. We show how our method can be useful for texture mapping with hard constraints. We start with an unconstrained planar embedding of a target mesh using a previously known method (Least Squares Conformal Map). Then, in order to obtain a texture map that satisfies the given constraints, we use the proposed warping method to align the features of the texture image with those on the unconstrained embedding. Compared to previous work, our method generates a smoother texture mapping, offers higher level of control for defining the constraints, and is simpler to implement.

Class Language Model based on Word Embedding and POS Tagging (워드 임베딩과 품사 태깅을 이용한 클래스 언어모델 연구)

  • Chung, Euisok;Park, Jeon-Gue
    • KIISE Transactions on Computing Practices
    • /
    • 제22권7호
    • /
    • pp.315-319
    • /
    • 2016
  • Recurrent neural network based language models (RNN LM) have shown improved results in language model researches. The RNN LMs are limited to post processing sessions, such as the N-best rescoring step of the wFST based speech recognition. However, it has considerable vocabulary problems that require large computing powers for the LM training. In this paper, we try to find the 1st pass N-gram model using word embedding, which is the simplified deep neural network. The class based language model (LM) can be a way to approach to this issue. We have built class based vocabulary through word embedding, by combining the class LM with word N-gram LM to evaluate the performance of LMs. In addition, we propose that part-of-speech (POS) tagging based LM shows an improvement of perplexity in all types of the LM tests.

Super Resolution by Learning Sparse-Neighbor Image Representation (Sparse-Neighbor 영상 표현 학습에 의한 초해상도)

  • Eum, Kyoung-Bae;Choi, Young-Hee;Lee, Jong-Chan
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • 제18권12호
    • /
    • pp.2946-2952
    • /
    • 2014
  • Among the Example based Super Resolution(SR) techniques, Neighbor embedding(NE) has been inspired by manifold learning method, particularly locally linear embedding. However, the poor generalization of NE decreases the performance of such algorithm. The sizes of local training sets are always too small to improve the performance of NE. We propose the Learning Sparse-Neighbor Image Representation baesd on SVR having an excellent generalization ability to solve this problem. Given a low resolution image, we first use bicubic interpolation to synthesize its high resolution version. We extract the patches from this synthesized image and determine whether each patch corresponds to regions with high or low spatial frequencies. After the weight of each patch is obtained by our method, we used to learn separate SVR models. Finally, we update the pixel values using the previously learned SVRs. Through experimental results, we quantitatively and qualitatively confirm the improved results of the proposed algorithm when comparing with conventional interpolation methods and NE.

A Blind Watermarking Scheme Using Singular Vector Based On DWT/RDWT/SVD (DWT/RDWT/SVD에 기반한 특이벡터를 사용한 블라인드 워터마킹 방안)

  • Luong, Ngoc Thuy Dung;Sohn, Won
    • Journal of Broadcast Engineering
    • /
    • 제21권2호
    • /
    • pp.149-156
    • /
    • 2016
  • We proposed a blind watermarking scheme using singular vectors based on Discrete Wavelet Transform (DWT) and Redundant Discrete Wavelet Transform (RDWT) combined with Singular Value Decomposition (SVD) for copyright protection application. We replaced the 1st left and right singular vectors decomposed from cover image with the corresponding ones from watermark image to overcome the false-positive problem in current watermark systems using SVD. The proposed scheme realized the watermarking system without a false positive problem, and shows high fidelity and robustness.