• Title/Summary/Keyword: Similar Word

Search Result 416, Processing Time 0.026 seconds

Parting Lyrics Emotion Classification using Word2Vec and LSTM (Word2Vec과 LSTM을 활용한 이별 가사 감정 분류)

  • Lim, Myung Jin;Park, Won Ho;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.9 no.3
    • /
    • pp.90-97
    • /
    • 2020
  • With the development of the Internet and smartphones, digital sound sources are easily accessible, and accordingly, interest in music search and recommendation is increasing. As a method of recommending music, research using melodies such as pitch, tempo, and beat to classify genres or emotions is being conducted. However, since lyrics are becoming one of the means of expressing human emotions in music, the role of the lyrics is increasing, so a study of emotion classification based on lyrics is needed. Therefore, in this thesis, we analyze the emotions of the farewell lyrics in order to subdivide the farewell emotions based on the lyrics. After constructing an emotion dictionary by vectoriziong the similarity between words appearing in the parting lyrics through Word2Vec learning, we propose a method of classifying parting lyrics emotions using Word2Vec and LSTM, which classify lyrics by similar emotions by learning lyrics using LSTM.

Analysis of Effect of Learning to Solve Word Problems through a Structure-Representation Instruction. (문장제 해결에서 구조-표현을 강조한 학습의 교수학적 효과 분석)

  • 이종희;김부미
    • School Mathematics
    • /
    • v.5 no.3
    • /
    • pp.361-384
    • /
    • 2003
  • The purpose of this study was to investigate students' problem solving process based on the model of IDEAL if they learn to solve word problems of simultaneous linear equations through structure-representation instruction. The problem solving model of IDEAL is followed by stages; identifying problems(I), defining problems(D), exploring alternative approaches(E), acting on a plan(A). 160 second-grade students of middle schools participated in a study was classified into those of (a) a control group receiving no explicit instruction of structure-representation in word problem solving, and (b) a group receiving structure-representation instruction followed by IDEAL. As a result of this study, a structure-representation instruction improved word-problem solving performance and the students taught by the structure-representation approach discriminate more sharply equivalent problem, isomorphic problem and similar problem than the students of a control group. Also, students of the group instructed by structure-representation approach have less errors in understanding contexts and using data, in transferring mathematical symbol from internal learning relation of word problem and in setting up an equation than the students of a control group. Especially, this study shows that the model of direct transformation and the model of structure-schema in students' problem solving process of I and D stages.

  • PDF

Expansion of Topic Modeling with Word2Vec and Case Analysis (Word2Vec를 이용한 토픽모델링의 확장 및 분석사례)

  • Yoon, Sang Hun;Kim, Keun Hyung
    • The Journal of Information Systems
    • /
    • v.30 no.1
    • /
    • pp.45-64
    • /
    • 2021
  • Purpose The traditional topic modeling technique makes it difficult to distinguish the semantic of topics because the key words assigned to each topic would be also assigned to other topics. This problem could become severe when the number of online reviews are small. In this paper, the extended model of topic modeling technique that can be used for analyzing a small amount of online reviews is proposed. Design/methodology/approach The extended model of being proposed in this paper is a form that combines the traditional topic modeling technique and the Word2Vec technique. The extended model only allocates main words to the extracted topics, but also generates discriminatory words between topics. In particular, Word2vec technique is applied in the process of extracting related words semantically for each discriminatory word. In the extended model, main words and discriminatory words with similar words semantically are used in the process of semantic classification and naming of extracted topics, so that the semantic classification and naming of topics can be more clearly performed. For case study, online reviews related with Udo in Tripadvisor web site were analyzed by applying the traditional topic modeling and the proposed extension model. In the process of semantic classification and naming of the extracted topics, the traditional topic modeling technique and the extended model were compared. Findings Since the extended model is a concept that utilizes additional information in the existing topic modeling information, it can be confirmed that it is more effective than the existing topic modeling in semantic division between topics and the process of assigning topic names.

A study on the classification of research topics based on COVID-19 academic research using Topic modeling (토픽모델링을 활용한 COVID-19 학술 연구 기반 연구 주제 분류에 관한 연구)

  • Yoo, So-yeon;Lim, Gyoo-gun
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.155-174
    • /
    • 2022
  • From January 2020 to October 2021, more than 500,000 academic studies related to COVID-19 (Coronavirus-2, a fatal respiratory syndrome) have been published. The rapid increase in the number of papers related to COVID-19 is putting time and technical constraints on healthcare professionals and policy makers to quickly find important research. Therefore, in this study, we propose a method of extracting useful information from text data of extensive literature using LDA and Word2vec algorithm. Papers related to keywords to be searched were extracted from papers related to COVID-19, and detailed topics were identified. The data used the CORD-19 data set on Kaggle, a free academic resource prepared by major research groups and the White House to respond to the COVID-19 pandemic, updated weekly. The research methods are divided into two main categories. First, 41,062 articles were collected through data filtering and pre-processing of the abstracts of 47,110 academic papers including full text. For this purpose, the number of publications related to COVID-19 by year was analyzed through exploratory data analysis using a Python program, and the top 10 journals under active research were identified. LDA and Word2vec algorithm were used to derive research topics related to COVID-19, and after analyzing related words, similarity was measured. Second, papers containing 'vaccine' and 'treatment' were extracted from among the topics derived from all papers, and a total of 4,555 papers related to 'vaccine' and 5,971 papers related to 'treatment' were extracted. did For each collected paper, detailed topics were analyzed using LDA and Word2vec algorithms, and a clustering method through PCA dimension reduction was applied to visualize groups of papers with similar themes using the t-SNE algorithm. A noteworthy point from the results of this study is that the topics that were not derived from the topics derived for all papers being researched in relation to COVID-19 (

    ) were the topic modeling results for each research topic (
    ) was found to be derived from For example, as a result of topic modeling for papers related to 'vaccine', a new topic titled Topic 05 'neutralizing antibodies' was extracted. A neutralizing antibody is an antibody that protects cells from infection when a virus enters the body, and is said to play an important role in the production of therapeutic agents and vaccine development. In addition, as a result of extracting topics from papers related to 'treatment', a new topic called Topic 05 'cytokine' was discovered. A cytokine storm is when the immune cells of our body do not defend against attacks, but attack normal cells. Hidden topics that could not be found for the entire thesis were classified according to keywords, and topic modeling was performed to find detailed topics. In this study, we proposed a method of extracting topics from a large amount of literature using the LDA algorithm and extracting similar words using the Skip-gram method that predicts the similar words as the central word among the Word2vec models. The combination of the LDA model and the Word2vec model tried to show better performance by identifying the relationship between the document and the LDA subject and the relationship between the Word2vec document. In addition, as a clustering method through PCA dimension reduction, a method for intuitively classifying documents by using the t-SNE technique to classify documents with similar themes and forming groups into a structured organization of documents was presented. In a situation where the efforts of many researchers to overcome COVID-19 cannot keep up with the rapid publication of academic papers related to COVID-19, it will reduce the precious time and effort of healthcare professionals and policy makers, and rapidly gain new insights. We hope to help you get It is also expected to be used as basic data for researchers to explore new research directions.

  • A Study on the Product Planning Model based on Word2Vec using On-offline Comment Analysis: Focused on the Noiseless Vertical Mouse User (온·오프라인 댓글 분석이 활용된 Word2Vec 기반 상품기획 모델연구: 버티컬 무소음마우스 사용자를 중심으로)

    • Ahn, Yeong-Hwi
      • Journal of Digital Convergence
      • /
      • v.19 no.10
      • /
      • pp.221-227
      • /
      • 2021
    • In this paper, we conducted word-to-word similarity analysis of standardized datasets collected through web crawling for 10,000 Vertical Noise Mouses using Word2Vec, and made 92 students of computer engineering use the products presented for 5 days, and conducted self-report questionnaire analysis. The questionnaire analysis was conducted by collecting the words in the form of a narrative form and presenting and selecting the top 50 words extracted from the word frequency analysis and the word similarity analysis. As a result of analyzing the similarity of e-commerce user's product review, pain (.985) and design (.963) were analyzed as the advantages of click keywords, and the disadvantages were vertical (.985) and adaptation (.948). In the descriptive frequency analysis, the most frequently selected items were Vertical (123) and Pain (118). Vertical (83) and Pain (75) were selected for the advantages of selecting the long/demerit similar words, and adaptation (89) and buttons (72) were selected for the disadvantages. Therefore, it is expected that decision makers and product planners of medium and small enterprises can be used as important data for decision making when the method applied in this study is reflected as a new product development process and a review strategy of existing products.

    A study on the Extraction of Similar Information using Knowledge Base Embedding for Battlefield Awareness

    • Kim, Sang-Min;Jin, So-Yeon;Lee, Woo-Sin
      • Journal of the Korea Society of Computer and Information
      • /
      • v.26 no.11
      • /
      • pp.33-40
      • /
      • 2021
    • Due to advanced complex strategies, the complexity of information that a commander must analyze is increasing. An intelligent service that can analyze battlefield is needed for the commander's timely judgment. This service consists of extracting knowledge from battlefield information, building a knowledge base, and analyzing the battlefield information from the knowledge base. This paper extract information similar to an input query by embedding the knowledge base built in the 2nd step. The transformation model is needed to generate the embedded knowledge base and uses the random-walk algorithm. The transformed information is embedding using Word2Vec, and Similar information is extracted through cosine similarity. In this paper, 980 sentences are generated from the open knowledge base and embedded as a 100-dimensional vector and it was confirmed that similar entities were extracted through cosine similarity.

    An Analysis on the Competence and the Methods of Problem Solving of Children at the Before of School Age in Four Operations Word Problems (학령 전 아이들의 사칙연산 문장제 해결 능력과 방법 분석)

    • Lee, Dae-Hyun
      • Journal of the Korean School Mathematics Society
      • /
      • v.13 no.3
      • /
      • pp.381-395
      • /
      • 2010
    • The purpose of this paper is to examine the competence and the methods of problem solving in four operations word problems based on the informal knowledges by five-year-old children. The numbers which are contained in problems consist of the numbers bigger than 5 and smaller than 10. The subjects were 21 five-year-old children who didn't learn four operations. The interview with observation was used in this research. Researcher gave the various materials to children and permitted to use them for problem solving. And researcher read the word problems to children and children solved the problems. The results are as follows: five-year-old children have the competence of problem solving in four operations word problems. They used mental computation or counting all materials strategy in addition problem. The methods of problem solving were similar to that of addition in subtraction, multiplication and division, but the rate of success was different. Children performed poor1y in division word problems. According to this research, we know that kindergarten educators should be interested in children's informal knowledges of four operations including shapes, patterns, statistics and probability. For this, it is needed to developed the curriculum and programs for informal mathematical experiences.

    • PDF

    The Role of Phonological Information in Korean Monosyllabic Word Processing (한글 일음절 단어처리에서의 음운정보의 역할)

    • 김연희;이창환
      • Korean Journal of Cognitive Science
      • /
      • v.15 no.1
      • /
      • pp.35-41
      • /
      • 2004
    • The letter delay task using monosyllabic words has been employed in order to investigate whether Korean word is processed by the phonological route, and to investigate which stage this phonological information affects word recognition. Two main conditions were delaying a sounding letter( $\rightarrow$향), and delaying a silent letter( $\rightarrow$양). Experiment 1 was the naming task with the SOAs of 150㎳ and 250㎳ in order to investigate whether the phonological information affects the early stages, or the later stages of word recognition. The results showed that the interaction between the phonological value condition and the presence/absence of the prime was significant under the 150㎳ SOA, but not under 250㎳ SOA. Experiment 2 was conducted in order to generalize the results of Experiment 1 in the lexical decision task. The results showed the similar pattern as the Experiment 1. These experiments indicate that Korean words are processed by the phonological route, and the phonological information plays roles in the early stages of word recognition.

    • PDF

    Profiling and Co-word Analysis of Teaching Korean as a Foreign Language Domain (프로파일링 분석과 동시출현단어 분석을 이용한 한국어교육학의 정체성 분석)

    • Kang, Beomil;Park, Ji-Hong
      • Journal of the Korean Society for information Management
      • /
      • v.30 no.4
      • /
      • pp.195-213
      • /
      • 2013
    • This study aims at establishing the identity of teaching Korean as a Foreign Language (KFL) domain by using journal profiling and co-word analysis in comparison with the relevant and adjacent domains. Firstly, by extracting and comparing topic terms, we calculate the similarity of academic journals of the three domains, KFL, teaching Korean as a Native Language (KNL), and Korean Linguistics (KL). The result shows that the journals of KFL form a distinct cluster from the others. The profiling analysis and co-word analysis are then conducted to visualize the relationship among all the three domains in order to uncover the characteristics of KFL. The findings show that KFL is more similar to KNL than to KL. Finally, the comparison of knowledge structures of these three domains based on the co-word analysis demonstrates the uniqueness of KFL as an independent domain in relation with the other relevant domains.

    Two-Phase Clustering Method Considering Mobile App Trends (모바일 앱 트렌드를 고려한 2단계 군집화 방법)

    • Heo, Jeong-Man;Park, So-Young
      • Journal of the Korea Society of Computer and Information
      • /
      • v.20 no.4
      • /
      • pp.17-23
      • /
      • 2015
    • In this paper, we propose a mobile app clustering method using word clusters. Considering the quick change of mobile app trends, the proposed method divides the mobile apps into some semantically similar mobile apps by applying a clustering algorithm to the mobile app set, rather than the predefined category system. In order to alleviate the data sparseness problem in the short mobile app description texts, the proposed method additionally utilizes the unigram, the bigram, the trigram, the cluster of each word. For the purpose of accurately clustering mobile apps, the proposed method manages to avoid exceedingly small or large mobile app clusters by using the word clusters. Experimental results show that the proposed method improves 22.18% from 57.48% to 79.66% on overall accuracy by using the word clusters.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.