• Title/Summary/Keyword: Seed Word

Search Result 30, Processing Time 0.025 seconds

Extracting Multiword Sentiment Expressions by Using a Domain-Specific Corpus and a Seed Lexicon

  • Lee, Kong-Joo;Kim, Jee-Eun;Yun, Bo-Hyun
    • ETRI Journal
    • /
    • v.35 no.5
    • /
    • pp.838-848
    • /
    • 2013
  • This paper presents a novel approach to automatically generate Korean multiword sentiment expressions by using a seed sentiment lexicon and a large-scale domain-specific corpus. A multiword sentiment expression consists of a seed sentiment word and its contextual words occurring adjacent to the seed word. The multiword sentiment expressions that are the focus of our study have a different polarity from that of the seed sentiment word. The automatically extracted multiword sentiment expressions show that 1) the contextual words should be defined as a part of a multiword sentiment expression in addition to their corresponding seed sentiment word, 2) the identified multiword sentiment expressions contain various indicators for polarity shift that have rarely been recognized before, and 3) the newly recognized shifters contribute to assigning a more accurate polarity value. The empirical result shows that the proposed approach achieves improved performance of the sentiment analysis system that uses an automatically generated lexicon.

Word Sense Disambiguation based on Concept Learning with a focus on the Lowest Frequency Words (저빈도어를 고려한 개념학습 기반 의미 중의성 해소)

  • Kim Dong-Sung;Choe Jae-Woong
    • Language and Information
    • /
    • v.10 no.1
    • /
    • pp.21-46
    • /
    • 2006
  • This study proposes a Word Sense Disambiguation (WSD) algorithm, based on concept learning with special emphasis on statistically meaningful lowest frequency words. Previous works on WSD typically make use of frequency of collocation and its probability. Such probability based WSD approaches tend to ignore the lowest frequency words which could be meaningful in the context. In this paper, we show an algorithm to extract and make use of the meaningful lowest frequency words in WSD. Learning method is adopted from the Find-Specific algorithm of Mitchell (1997), according to which the search proceeds from the specific predefined hypothetical spaces to the general ones. In our model, this algorithm is used to find contexts with the most specific classifiers and then moves to the more general ones. We build up small seed data and apply those data to the relatively large test data. Following the algorithm in Yarowsky (1995), the classified test data are exhaustively included in the seed data, thus expanding the seed data. However, this might result in lots of noise in the seed data. Thus we introduce the 'maximum a posterior hypothesis' based on the Bayes' assumption to validate the noise status of the new seed data. We use the Naive Bayes Classifier and prove that the application of Find-Specific algorithm enhances the correctness of WSD.

  • PDF

A Sentiment Classification System Using Feature Extraction from Seed Words and Support Vector Machine (종자 어휘를 이용한 자질 추출과 지지 벡터 기계(SVM)을 이용한 문서 감정 분류 시스템의 개발)

  • Hwang, Jae-Won;Jeon, Tae-Gyun;Ko, Young-Joong
    • 한국HCI학회:학술대회논문집
    • /
    • 2007.02a
    • /
    • pp.938-942
    • /
    • 2007
  • 신문 기사 및 상품 평은 특정 주제나 상품을 대상으로 하여 글쓴이의 감정과 의견이 잘 나타나 있는 대표적인 문서이다. 최근 여론 조사 및 상품 의견 조사 등 다양한 측면에서 대용량의 문서의 의미적 분류 및 분석이 요구되고 있다. 본 논문에서는 문서에 나타난 내용을 기준으로 문서가 나타내고 있는 감정을 긍정과 부정의 두 가지 범주로 분류하는 시스템을 구현한다. 문서 분류의 시작은 감정을 지닌 대표적인 종자 어휘(seed word)로부터 시작하며, 자질의 선정은 한국어 특징상 감정 및 감각을 표현하는 명사, 형용사, 부사, 동사를 대상으로 한다. 가중치 부여 방법은 한글 유의어 사전을 통해 종자 어휘의 의미를 확장하여 각각의 가중치를 책정한다. 단어 벡터로 표현된 입력 문서를 이진 분류기인 지지벡터 기계를 이용하여 문서에 나타난 감정을 판단하는 시스템을 구현하고 그 성능을 평가한다.

  • PDF

Utilizing Local Bilingual Embeddings on Korean-English Law Data (한국어-영어 법률 말뭉치의 로컬 이중 언어 임베딩)

  • Choi, Soon-Young;Matteson, Andrew Stuart;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.10
    • /
    • pp.45-53
    • /
    • 2018
  • Recently, studies about bilingual word embedding have been gaining much attention. However, bilingual word embedding with Korean is not actively pursued due to the difficulty in obtaining a sizable, high quality corpus. Local embeddings that can be applied to specific domains are relatively rare. Additionally, multi-word vocabulary is problematic due to the lack of one-to-one word-level correspondence in translation pairs. In this paper, we crawl 868,163 paragraphs from a Korean-English law corpus and propose three mapping strategies for word embedding. These strategies address the aforementioned issues including multi-word translation and improve translation pair quality on paragraph-aligned data. We demonstrate a twofold increase in translation pair quality compared to the global bilingual word embedding baseline.

Research on Designing Korean Emotional Dictionary using Intelligent Natural Language Crawling System in SNS (SNS대상의 지능형 자연어 수집, 처리 시스템 구현을 통한 한국형 감성사전 구축에 관한 연구)

  • Lee, Jong-Hwa
    • The Journal of Information Systems
    • /
    • v.29 no.3
    • /
    • pp.237-251
    • /
    • 2020
  • Purpose The research was studied the hierarchical Hangul emotion index by organizing all the emotions which SNS users are thinking. As a preliminary study by the researcher, the English-based Plutchick (1980)'s emotional standard was reinterpreted in Korean, and a hashtag with implicit meaning on SNS was studied. To build a multidimensional emotion dictionary and classify three-dimensional emotions, an emotion seed was selected for the composition of seven emotion sets, and an emotion word dictionary was constructed by collecting SNS hashtags derived from each emotion seed. We also want to explore the priority of each Hangul emotion index. Design/methodology/approach In the process of transforming the matrix through the vector process of words constituting the sentence, weights were extracted using TF-IDF (Term Frequency Inverse Document Frequency), and the dimension reduction technique of the matrix in the emotion set was NMF (Nonnegative Matrix Factorization) algorithm. The emotional dimension was solved by using the characteristic value of the emotional word. The cosine distance algorithm was used to measure the distance between vectors by measuring the similarity of emotion words in the emotion set. Findings Customer needs analysis is a force to read changes in emotions, and Korean emotion word research is the customer's needs. In addition, the ranking of the emotion words within the emotion set will be a special criterion for reading the depth of the emotion. The sentiment index study of this research believes that by providing companies with effective information for emotional marketing, new business opportunities will be expanded and valued. In addition, if the emotion dictionary is eventually connected to the emotional DNA of the product, it will be possible to define the "emotional DNA", which is a set of emotions that the product should have.

Bilingual Word Embedding using Subtitle Parallel Corpus (자막 병렬 코퍼스를 이용한 이중 언어 워드 임베딩)

  • Lee, Seolhwa;Lee, Chanhee;Lim, Heuiseok
    • Proceedings of The KACE
    • /
    • 2017.08a
    • /
    • pp.157-160
    • /
    • 2017
  • 최근 자연 언어 처리 분야에서는 단어를 실수벡터로 임베딩하는 워드 임베딩(Word embedding) 기술이 많은 각광을 받고 있다. 최근에는 서로 다른 두 언어를 이용한 이중 언어 위드 임베딩(Bilingual word embedding) 방법을 사용하는 연구가 많이 이루어지고 있는데, 이중 언어 워드 임베딩에서 임베딩 절과의 질은 학습하는 코퍼스의 정렬방식에 따라 많은 영향을 받는다. 본 논문은 자막 병렬 코퍼스를 이용하여 밑바탕 어휘집(Seed lexicon)을 구축하여 번역 연결 강도를 향상시키고, 이중 언어 워드 임베딩의 사천(Vocabulary) 확장을 위한 언어별 연결 함수(Language-specific mapping function)을 학습하는 새로운 방식의 모델을 제안한다. 제안한 모델은 기존 모델과의 성능비교에서 비교할만한 수준의 결과를 얻었다.

  • PDF

The History of Social and Culture with Food in the Bible (The Five of Books of Moses) (성경상에 나타난 식품 사회.문화사 (모세 오경 중심으로))

  • 김영희
    • The Korean Journal of Food And Nutrition
    • /
    • v.8 no.2
    • /
    • pp.128-134
    • /
    • 1995
  • God created the havens and earth and god gave man every seed-bearing plant and every tree that has fruit with seed in it. These food Is all vegetable food that don't take diseases of adult people. But God gave Noah the green plants. Everything that lives and moves will be food for Noah. Just as God gave man the green plants. And then man must not eat meat (animal protein food) that has its lifeblood still in it and God must not eat fat in it. The fat contains much fat (saturated fatty acids) and cholesterol that have susceptibility to disease of coronary heart, hypertension and atheriosclerosis etc. God must not eat these fat before we don't know that It have susceptibility to disease of adult people.

  • PDF

Spread of Negative Word-of-mouth of Manufacturing Companies Via Twitter: From the Supply Chain Risk's Perspective (트위터를 통한 제조 기업의 부정적 구전 확산: 공급사슬 리스크 관점에서)

  • Jeong, EuiBeom;Yoo, Hanna
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.26 no.5
    • /
    • pp.79-94
    • /
    • 2021
  • Despite the importance of the supply chain risk due to the negative word-of-mouth (NWOM) in social media, related research is insufficient. Thus, this study analyzes how the NWOM of the product is distributed through social media and the characteristics of the distributor based on social exchange theory. For this purpose, we collected information on car recalls from four companies using Twitter from the National Highway Traffic Safety Administration (NHTSA). Based on the Seed Tweet, a Re-Tweet (RT) network was constructed to examine the distribution and spread of NWOM, and regression analysis was performed to test the hypothesis. As a result, it was confirmed that NWOM is a small world network structure that spreads around hub users connected to many users. Moreover, it was found that the more interactive and reciprocal relations the first distributor has, the greater the speed and scale of distribution of NWOM.

Modified Feistel Network Block Cipher Algorithm (변형 피스탈 네트워크 블록 암호 알고리즘)

  • Cho, Gyeong-Yeon;Song, Hong-Bok
    • Journal of the Korea Computer Industry Society
    • /
    • v.10 no.3
    • /
    • pp.105-114
    • /
    • 2009
  • In this paper a modified Feistel network 128 bit block cipher algorithm is proposed. The proposed algorithm has a 128, 196 or 256 bit key and it updates a selected 32 bit word from input value whole by deformed Feistel Network structure. Existing of such structural special quality is getting into block cipher algorithms and big distinction. The proposed block cipher algorithm shows much improved software speed compared with international standard block cipher algorithm AES and domestic standard block cipher algorithm SEED and ARIA. It may be utilized much in same field coming smart card that must perform in limited environment if use these special quality.

  • PDF

Design and Implementation of Providing Conditional Access Broadcasting Service System (수신 제한된 방송 서비스 제공 시스템 설계 및 구현)

  • Kim, Dong-Ok;Shin, Ik-Ryong
    • Journal of The Institute of Information and Telecommunication Facilities Engineering
    • /
    • v.8 no.2
    • /
    • pp.64-71
    • /
    • 2009
  • In this paper, This thesis is cell phone for make CAS service be for hand joining broadcasting Create a way CAS Chip. PerSam issue card inside use Seed Key and algorithm make CID Key and record CAS Chip. PerSam member Card inside use Seed Key and algorithm make Subscriber Key after include Subscriber. Key CAS Chip for record CID Key register EMM. make CAS CHIP in accordance with issue CAS Chip. broadcast service entry be for hand treatment so make low bandwidth for joining massage and make increase a member.

  • PDF