• Title/Summary/Keyword: word2vec

Search Result 224, Processing Time 0.023 seconds

Emerging Topic Detection Using Text Embedding and Anomaly Pattern Detection in Text Streaming Data (텍스트 스트리밍 데이터에서 텍스트 임베딩과 이상 패턴 탐지를 이용한 신규 주제 발생 탐지)

  • Choi, Semok;Park, Cheong Hee
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.9
    • /
    • pp.1181-1190
    • /
    • 2020
  • Detection of an anomaly pattern deviating normal data distribution in streaming data is an important technique in many application areas. In this paper, a method for detection of an newly emerging pattern in text streaming data which is an ordered sequence of texts is proposed based on text embedding and anomaly pattern detection. Using text embedding methods such as BOW(Bag Of Words), Word2Vec, and BERT, the detection performance of the proposed method is compared. Experimental results show that anomaly pattern detection using BERT embedding gave an average F1 value of 0.85 and the F1 value of 1 in three cases among five test cases.

Performance Comparison of Recurrent Neural Networks and Conditional Random Fields in Biomedical Named Entity Recognition (의생명 분야의 개체명 인식에서 순환형 신경망과 조건적 임의 필드의 성능 비교)

  • Jo, Byeong-Cheol;Kim, Yu-Seop
    • 한국어정보학회:학술대회논문집
    • /
    • 2016.10a
    • /
    • pp.321-323
    • /
    • 2016
  • 최근 연구에서 기계학습 중 지도학습 방법으로 개체명 인식을 하고 있다. 그러나 지도 학습 방법은 데이터를 만드는 비용과 시간이 많이 필요로 한다. 본 연구에서는 주석 된 말뭉치를 사용하여 지도 학습 방법을 사용 한다. 의생명 개체명 인식은 Protein, RNA, DNA, Cell type, Cell line 등을 포함한 텍스트 처리에 중요한 기초 작업입니다. 그리고 의생명 지식 검색에서 가장 기본과 핵심 작업 중 하나이다. 본 연구에서는 순환형 신경망과 워드 임베딩을 자질로 사용한 조건적 임의 필드에 대한 성능을 비교한다. 조건적 임의 필드에 N_Gram만을 자질로 사용한 것을 기준점으로 설정 하였고, 기준점의 결과는 70.09% F1 Score이다. RNN의 jordan type은 60.75% F1 Score, elman type은 58.80% F1 Score의 성능을 보여준다. 조건적 임의 필드에 CCA, GLOVE, WORD2VEC을 사용 한 결과는 각각 72.73% F1 Score, 72.74% F1 Score, 72.82% F1 Score의 성능을 얻을 수 있다.

  • PDF

On Word Embedding Models and Parameters Optimized for Korean (한국어에 적합한 단어 임베딩 모델 및 파라미터 튜닝에 관한 연구)

  • Choi, Sanghyuk;Seol, Jinseok;Lee, Sang-goo
    • Annual Conference on Human and Language Technology
    • /
    • 2016.10a
    • /
    • pp.252-256
    • /
    • 2016
  • 본 논문에서는 한국어에 최적화된 단어 임베딩을 학습하기 위한 방법을 소개한다. 단어 임베딩이란 각 단어가 분산된 의미를 지니도록 고정된 차원의 벡터공간에 대응 시키는 방법으로, 기계번역, 개체명 인식 등 많은 자연어처리 분야에서 활용되고 있다. 본 논문에서는 한국어에 대해 최적의 성능을 낼 수 있는 학습용 말뭉치와 임베딩 모델 및 적합한 하이퍼 파라미터를 실험적으로 찾고 그 결과를 분석한다.

  • PDF

Vocabulary Analysis of Safety Warnings in Construction Site (건설현장 안전 지적 사항 분석)

  • Kang, Kyung-Su;Ryu, Han-Guk
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2019.11a
    • /
    • pp.40-41
    • /
    • 2019
  • The purpose of this study is to analyze the vocabulary related to safety accidents based on the reports recorded on the violation of safety rules at the construction sites. We used Word2Vec and Topic Model as natural language processing techniques to analyze the safety accidents presented in the reports of the large enterprise. The words that appeared based on the occupational accident types such as the fall, falling objects, and others were derived and visualized. We derive the frequency and similarity of the words and topics of the accident that occur at the construction site. In future studies, we will be able to proceed with the generation of texts from pictures based on images and this reports.

  • PDF

Target and Swear Word Detection Using Sentence Analysis in Real-Time Chatting (실시간 채팅 환경에서 문장 분석을 이용한 대상자 및 비속어 검출)

  • Yeom, Choongseok;Jang, Junyoung;Jang, Yuhwan;Kim, Hyun-chul;Park, Heemin
    • Journal of the Semiconductor & Display Technology
    • /
    • v.20 no.1
    • /
    • pp.83-87
    • /
    • 2021
  • By the increase of internet usage, communicating online became an everyday thing. Thereby various people have experienced profanity by anonymous users. Nowadays lots of studies tried to solve this problem using artificial intelligence, but most of the solutions were for non-real time situations. In this paper, we propose a Telegram plugin that detects swear words using word2vec, and an algorithm to find the target of the sentence. We vectorized the input sentence to find connections with other similar words, then inputted the value to the pre-trained CNN (Convolutional Neural Network) model to detect any swears. For target recognition we proposed a sequential algorithm based on KoNLPY.

Renewable energy trends and relationship structure by SNS big data analysis (SNS 빅데이터 분석을 통한 재생에너지 동향 및 관계구조)

  • Jong-Min Kim
    • Convergence Security Journal
    • /
    • v.22 no.1
    • /
    • pp.55-60
    • /
    • 2022
  • This study is to analyze trends and relational structures in the energy sector related to renewable energy. For this reason, in this study, we focused on big data including SNS data. SNS utilizes the Instagram platform to collect renewable energy hash tags and use them as a word embedding method for big data analysis and social network analysis, and based on the results derived from this research, it will be used for the development of the renewable energy industry. It is expected that it can be utilized.

Impact of Corporate Personality on the Relationship between Job Satisfaction and Turnover Rate : Based on the Corporate Review of Job-Planet (기업개성이 직원의 직무만족과 기업 이직률의 관계에 미치는 영향 : 잡플래닛 기업 리뷰를 중심으로)

  • An, Byungdae;Choi, Jinwook;Suh, Yongmoo
    • Journal of Information Technology Services
    • /
    • v.19 no.3
    • /
    • pp.35-56
    • /
    • 2020
  • The purpose of this study is to measure corporate personality by analyzing the internal employees' corporate reviews and to identify the impact of the representative corporate personality on the relationship between job satisfaction of internal employees and the turnover rate of the company. To this end, we first created a dictionary of words representing the corporate personality with a Word2vec method based on words explaining five corporate personalities, such as reliability, initiative, practicality, activism, and femininity, obtained from the preceding study. Next, we analyzed reviews which were written by internal employees on their companies to measure the score of corporate personality at a review level, aggregated the review level scores for each company to calculate the company level score of corporate personality, and assigned to each company the corporate personality with the maximum score among the five such scores. Also, job satisfaction and turnover rate were measured from internal employees' corporate evaluation scores and the percentage of former employees of each company who left a review on the company, respectively. This study collected datasets of corporate reviews, employee information, and corporate information from Job-Planet from 2014 to 2017, conducted a technical statistic check and correlation analysis to confirm the suitability of the datasets, and performed linear regression analysis to evaluate the research model and verify hypotheses. As a result of the analysis, the job satisfaction of the internal staff has a significant negative impact on the corporate's turnover rate. In addition, companies having a personality of reliability, initiative and femininity also showed a significant cause-and-effect relationship between job satisfaction and turnover rate and among them, job satisfaction of companies having a personality, initiative, showed a greater impact on turnover rate. In sum, we not only proposed a novel method of measuring corporate personality, but also showed that corporates need to identify its corporate personality and to utilize a different strategy to reduce their employee's turnover rate depending on the corporate personality.

Application of Social Big Data Analysis for CosMedical Cosmetics Marketing : H Company Case Study (기능성 화장품 마케팅의 소셜 빅데이터 분석 활용 : H사 사례를 중심으로)

  • Hwang, Sin-Hae;Ku, Dong-Young;Kim, Jeoung-Kun
    • Journal of Digital Convergence
    • /
    • v.17 no.7
    • /
    • pp.35-41
    • /
    • 2019
  • This study aims to analyze the cosmedical cosmetics market and the nature of customer through the social big data analysis. More than 80,000 posts were analyzed using R program. After data cleansing, keyword frequency analysis and association analysis were performed to understand customer needs and competitor positioning, formulated several implications for marketing strategy sophistication and implementation. Analysis results show that "prevention" is a new and essential attribute for appealing target customers. The expansion of the product line for the gift market is also suggested. It has been shown that there is a high correlation with products that can be complementary to each other. In addition to the traditional marketing technique, the social big data analysis based on evidence was useful in deriving the characteristics of the customers and the market that had not been identified before. Word2vec algorithm will be beneficial to find additional.

Quality Indicator Based Recommendation System of the National Assembly Members for Political Sponsors (품질지표기반 정치 후원금 지원을 위한 국회의원 추천시스템 연구)

  • Jung, Hyun Woo;Yoon, Hyung Jun;Lee, See Eun;Park, Sol Hee;Sohn, So Young
    • Journal of Korean Society for Quality Management
    • /
    • v.49 no.1
    • /
    • pp.17-29
    • /
    • 2021
  • Purpose: During 2015-2019, the average amount of political donation to the national assembly members in Korea was 1,000 won per person. Despite its benefits such as receiving tax credits, the donation system has not been actively practiced. This paper aims to promote political donations by suggesting a recommendation system of national assembly members by analysing the bills they proposed. Methods: In this paper, we propose a recommendation system based on two aspects: how similar the newly proposed or ammended bills are to the sponsors' interest (similarity index) and how much effort national assembly members put into those bills (intensity index). More than 25,000 bills were used to measure the recommendation quality index consisted with both the similarity and the intensity indices. Word2vec was used to calculate the similarity index of the bills proposed by the national assembly member to the sponsor's interest. The intensity index is calculated by diving the number of newly proposed or entirely revised bills with the number of senators who took part in those bills. Subsequently, we multiply the similarity index by the intensity index to obtain the recommendation quality index that can assist sponsors to identify potential assembly members for their donation. Results: We apply the proposed recommendation system to personas for illustration. The recommendation system showed an average f1 score about 0.69. The analysis results provide insights in recommendation for donation. Conclusion: n this study, the recommendation system was proposed to promote a political donation for national assembly members by creating the recommendation quality index based on the similarity and the intensity indices. We expect that the system presented in this paper will lower user barriers to political information, thereby boosting political sponsorship and increasing political participation.

Multi-Dimensional Emotion Recognition Model of Counseling Chatbot (상담 챗봇의 다차원 감정 인식 모델)

  • Lim, Myung Jin;Yi, Moung Ho;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.10 no.4
    • /
    • pp.21-27
    • /
    • 2021
  • Recently, the importance of counseling is increasing due to the Corona Blue caused by COVID-19. Also, with the increase of non-face-to-face services, researches on chatbots that have changed the counseling media are being actively conducted. In non-face-to-face counseling through chatbot, it is most important to accurately understand the client's emotions. However, since there is a limit to recognizing emotions only in sentences written by the client, it is necessary to recognize the dimensional emotions embedded in the sentences for more accurate emotion recognition. Therefore, in this paper, the vector and sentence VAD (Valence, Arousal, Dominance) generated by learning the Word2Vec model after correcting the original data according to the characteristics of the data are learned using a deep learning algorithm to learn the multi-dimensional We propose an emotion recognition model. As a result of comparing three deep learning models as a method to verify the usefulness of the proposed model, R-squared showed the best performance with 0.8484 when the attention model is used.