• Title/Summary/Keyword: 어휘 분포

Search Result 77, Processing Time 0.021 seconds

Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary (주가지수 방향성 예측을 위한 주제지향 감성사전 구축 방안)

  • Yu, Eunji;Kim, Yoosin;Kim, Namgyu;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.95-110
    • /
    • 2013
  • Recently, the amount of unstructured data being generated through a variety of social media has been increasing rapidly, resulting in the increasing need to collect, store, search for, analyze, and visualize this data. This kind of data cannot be handled appropriately by using the traditional methodologies usually used for analyzing structured data because of its vast volume and unstructured nature. In this situation, many attempts are being made to analyze unstructured data such as text files and log files through various commercial or noncommercial analytical tools. Among the various contemporary issues dealt with in the literature of unstructured text data analysis, the concepts and techniques of opinion mining have been attracting much attention from pioneer researchers and business practitioners. Opinion mining or sentiment analysis refers to a series of processes that analyze participants' opinions, sentiments, evaluations, attitudes, and emotions about selected products, services, organizations, social issues, and so on. In other words, many attempts based on various opinion mining techniques are being made to resolve complicated issues that could not have otherwise been solved by existing traditional approaches. One of the most representative attempts using the opinion mining technique may be the recent research that proposed an intelligent model for predicting the direction of the stock index. This model works mainly on the basis of opinions extracted from an overwhelming number of economic news repots. News content published on various media is obviously a traditional example of unstructured text data. Every day, a large volume of new content is created, digitalized, and subsequently distributed to us via online or offline channels. Many studies have revealed that we make better decisions on political, economic, and social issues by analyzing news and other related information. In this sense, we expect to predict the fluctuation of stock markets partly by analyzing the relationship between economic news reports and the pattern of stock prices. So far, in the literature on opinion mining, most studies including ours have utilized a sentiment dictionary to elicit sentiment polarity or sentiment value from a large number of documents. A sentiment dictionary consists of pairs of selected words and their sentiment values. Sentiment classifiers refer to the dictionary to formulate the sentiment polarity of words, sentences in a document, and the whole document. However, most traditional approaches have common limitations in that they do not consider the flexibility of sentiment polarity, that is, the sentiment polarity or sentiment value of a word is fixed and cannot be changed in a traditional sentiment dictionary. In the real world, however, the sentiment polarity of a word can vary depending on the time, situation, and purpose of the analysis. It can also be contradictory in nature. The flexibility of sentiment polarity motivated us to conduct this study. In this paper, we have stated that sentiment polarity should be assigned, not merely on the basis of the inherent meaning of a word but on the basis of its ad hoc meaning within a particular context. To implement our idea, we presented an intelligent investment decision-support model based on opinion mining that performs the scrapping and parsing of massive volumes of economic news on the web, tags sentiment words, classifies sentiment polarity of the news, and finally predicts the direction of the next day's stock index. In addition, we applied a domain-specific sentiment dictionary instead of a general purpose one to classify each piece of news as either positive or negative. For the purpose of performance evaluation, we performed intensive experiments and investigated the prediction accuracy of our model. For the experiments to predict the direction of the stock index, we gathered and analyzed 1,072 articles about stock markets published by "M" and "E" media between July 2011 and September 2011.

Improving Phoneme Recognition based on Gaussian Model using Bhattacharyya Distance Measurement Method (바타챠랴 거리 측정 기법을 사용한 가우시안 모델 기반 음소 인식 향상)

  • Oh, Sang-Yeob
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.1
    • /
    • pp.85-93
    • /
    • 2011
  • Previous existing vocabulary recognition programs calculate general vector values from a database, so they can not process phonemes that form during a search. And because they can not create a model for phoneme data, the accuracy of the Gaussian model can not secure. Therefore, in this paper, we recommend use of the Bhattacharyya distance measurement method based on the features of the phoneme-thus allowing us to improve the recognition rate by picking up accurate phonemes and minimizing recognition of similar and erroneous phonemes. We test the Gaussian model optimization through share continuous probability distribution, and we confirm the heighten recognition rate. The Bhattacharyya distance measurement method suggest in this paper reflect an average 1.9% improvement in performance compare to previous methods, and it has average 2.9% improvement based on reliability in recognition rate.

A method for morphological correction of ambiguous error (한글 문서에서 형태적 중의 오류의 교정)

  • Kim, Min-Ju;Jeong, Jun-Ho;Lee, Hyeon-Ju;Choe, Jae-Hyeok;Kim, Hang-Jun;Lee, Sang-Jo
    • Annual Conference on Human and Language Technology
    • /
    • 1998.10c
    • /
    • pp.41-48
    • /
    • 1998
  • 교정 시스템에 나타나는 오류 유형들 중에는 전체적인 교정률에 차지하는 비중은 적지만 출현할 때마다 틀릴 가능성이 아주 높은 오류들이 있다. 기존의 교정 시스템에서는 이러한 오류들에 대한 처리가 미흡한데, 철자 오류와 띄어쓰기 오류 중 형태가 비슷하거나 같은 형태가 다른 기능을 함으로써 발생하는 오류들이다. 이러한 오류는 일반 문서 작성자뿐만 아니라 한글 맞춤법에 대해 어느 정도 지식을 가진 사람의 경우에도 구분이 모호하다. 복합 명사와 미등록어를 제외한 오류 중 약 30%가 여기에 속한다. 따라서 본 논문에서는 이러한 오류 유형들을 분류하고, 이 중에서 빈번하게 출현하는 오류에 대한 교정을 시도하고, 오류 유형들이 문장 내에서 어떤 분포를 가지는지 알아본다. 약 617만 어절의 말뭉치를 이용하여 해당 형태와 다른 성분들과의 관련성을 조사하여 교정 방법을 제시하고, 형태소 분석을 하여 교정을 행한다. 코퍼스 655만 어절 대상으로 실험한 결과 84.6%의 교정률을 보였다. 본 논문에서 제시한 교정 방법은 기존의 교정 시스템에 추가되어 교정 시스템의 전체 교정률을 향상시킬 수 있다. 또한 이와 비슷한 유형의 다른 어휘 교정에 대한 기초 자료로 사용될 수 있을 것이다.

  • PDF

On the development of a computational lexical database of idiomatic expressions in the frmework of 21st Sejong Project (21세기 세종계획 관용표현 전자사전 구축에 대하여)

  • Pak, Man-Ghyu;Yi, Sun-Woong;Na, Yun-Hee;Lee, Kwang-Ho
    • Annual Conference on Human and Language Technology
    • /
    • 2001.10d
    • /
    • pp.334-340
    • /
    • 2001
  • 본고는 올해 처음 시도하는 세종계획 관용표현 전자사전 구축에 관한 글이다. 본 전자사전이 완성되면 관용표현의 총체적 정보(형태, 통사, 의미, 화용 정보)를 수록하는 최초의 업적이 될 뿐만 아니라 실제 언어 자료에서 흔히 볼 수 있는 관습적 표현까지 모두 포괄하는 4만 표제어의 대규모 사전이 될 것이다. 본 사전에서는 관용표현의 형태 통사적 구성과 그 분포적 속성뿐 아니라, 관용표현이 가지는 논항의 존재 유무, 구조, 조사 통합 양상, 그리고 고정명사에 대한 수식어 제약, 어휘적 통사적 변형 양상, 선어말어미 제약, 어말어미 제약, 문장 유형 제약 등이 수록된다. 또한 각 논항의 의미역과 선택제약에 관한 정보, 그 외 다양한 의미 화용 정보 어원 표기 정보 등도 담기게 된다. 본고에서는 그러한 정보의 표기 양식을 하나하나 명시적으로 설명할 것이다.

  • PDF

Development of Voice Dialing System based on Keyword Spotting Technique (핵심어 추출 기반 음성 다이얼링 시스템 개발)

  • Park, Jeon-Gue;Suh, Sang-Weon;Han, Mun-Sung
    • Annual Conference on Human and Language Technology
    • /
    • 1996.10a
    • /
    • pp.153-157
    • /
    • 1996
  • 본 논문은 연속 분포 HMM을 사용한 핵심어 추출기법(Keyword Spotting)과 화자 인식에 기반한 음성 다이얼링 및 부서 안내에 관한 것이다. 개발된 시스템은 상대방의 이름, 직책, 존칭 등에 감탄사나 명령어 등이 혼합된 형태의 자연스런 음성 문장으로부터 다이얼링과 안내에 필요한 핵심어를 자동 추출하고 있다. 핵심 단어의 사용에는 자연성을 고려하여 문법적 제약을 최소한으로 두었으며, 각 단어 모델에 대해서는 음소의 갯수 더하기 $3{\sim}4$개의 상태 수와 3개 정도의 mixture component로써 좌우향 모델을, 묵음모델에 대해서는 2개 상태의 ergodic형 모델을 구성하였다. 인식에 있어서는 프레임 동기 One-Pass 비터비 알고리즘과 beam pruning을 채택하였으며, 인식에 사용된 어휘는 36개의 성명, 8개의 직위 및 존칭, 5개 정도의 호출어, 부탁을 나타내는 동사 및 그 활용이 10개 정도이다. 약 $3{\sim}6$개 정도의 단어로 구성된 문장을 실시간($1{\sim}3$초이내)에 인식하고, 약 98% 정도의 핵심어 인식 성능을 나타내고 있다.

  • PDF

감성개념 차원구조의 특징에 관한 연구 -아동청소년 및 임상집단을 중심으로-

  • 문혜신;김진관;오경자
    • Proceedings of the Korean Society for Emotion and Sensibility Conference
    • /
    • 1998.11a
    • /
    • pp.59-64
    • /
    • 1998
  • 정상 성인의 경우, 감성 개념의 내적 차원 구조는 쾌/불쾌 차원과 각성 차원이라는 2차원 구조에 원형의 체계적인 분포를 보이는 것으로 알려져 왔다. 본 연구에서는 이와 같은 2차원 구조가 얼마나 보편적이고 일관된 양상으로 나타나고 있는지를 살펴보고자 하였다 이를 위해 연구1에서는 아동 및 청소년에게 15개의 정서 관련 어휘로 이루어진 10i개의 단어 쌍에 대한 7점 척도의 유사성 평정을 시행하였으며, 연구 2에서는 정신분열증 환자에게 통일한 절차를 시행하였다 다차원 분석 결과, 1차원(초등5년:74%, 중등2 년:72%, 정신분열증 환자: 60%)과 2차원(초등5년: 18%, 중등2년16%, 정신분열증 환자: 11%)이 도출되었다 정상 성인의 경우와 마찬가지로 1차원은 쾌/불쾌 차원, 2차원은 각성 차원으로 해석될 수 있었다. 따라서, 감성 개념의 구조에 있어서 쾌/불쾌 및 각성은 인지적 성숙의 단계나 인지, 정서적 손상에 관계 없이 매우 일관되게 나타나는 비교적 안정적인 차원 구조인 것으로 생각된다. 다만, 발달 단계나 병리적 속성에 따라 각 차원의 비중치는 다소 차이를 보이는데, 아동 및 청소년의 경우, 주로 쾌/불쾌 차원을 통해 감성을 개념화 하는 특징을 보이며, 정신분열증 환자 집단의 경우, 1,2차원 모두 상대적으로 설명량이 낮은 것으로 나타나는데, 의 경우, 비중치가 더욱 낮은 것으로 나타났다. 이러한 결과를 통해 내적 차원 구조의 타당성 및 제한점에 관해 논의하였다.

  • PDF

To Constrain Korean Compound Nouns using Semantic Information for Korean Grammar Checker (한국어 문법검사기에서 의미정보를 이용한 복합명사의 분석제약)

  • Won, Sang-Yun;Kim, Su-Nam;Kim, Kwang-Young;Nam, Hyun-Suk;Kwon, Hyuk-Chul
    • Annual Conference on Human and Language Technology
    • /
    • 1999.10e
    • /
    • pp.288-293
    • /
    • 1999
  • 일반적으로 두 개의 명사가 결합하여 하나의 명사 기능을 하는 어구를 복합명사라고 한다. 한국어는 복합명사 내의 명사를 붙여볼 수도 있고 띄어쓸 수도 있으므로 복합명사의 형태적 분석에 많은 어려움이 있다. 이 연구에서는 각 명사의 복합명사 결합을 최대한 제약하여 문법검사기에서 복합명사와 관련된 오류의 발생을 최소화할 수 있는 방범을 개발했다. 이 논문에서 복합명사 분석 기능을 제약하는 방법으로 형태적 제약 방법과 의미정보에 따라 복합명사의 결합관계를 제약하는 방법을 이용했다. 어휘 정보만으로 복합명사를 분석하면 의미관계에 의한 오류는 찾기 어려우므로 복합명사의 구조적 결합관계와 의미 결합관계를 밝혀 복합명사를 잘못 분석하는 문제점을 극복한다. 복합명사의 결합제약은 명사의 왼쪽과 오른쪽에 올 수 있거나 올 수 없는 명사를 의미, 형태적 특성과 명사가 나타나는 분포(distribution)에 따라 분류하여 규칙베이스화하였다. 의미정보를 이용한 복합명사 결합제약 알고리즘도 구현하였다.

  • PDF

A Study of Disaster Recognition Based on Disaster-related Place Names (재난과 관련된 지명에 투영된 방재인식에 관한 연구)

  • PARK, Kyeong;KIM, Sunhee
    • Journal of The Geomorphological Association of Korea
    • /
    • v.17 no.2
    • /
    • pp.15-28
    • /
    • 2010
  • This study aims at the exploration of usefulness of traditional knowledge reflected in the place names. This study is useful and pragmatic for the establishment of disaster prevention measures from the analyses of disaster database, which shows the regional distribution and historical changes of disaster characteristics through history. The construction and categorization of disaster-related place names are based on historical maps and literatures on place names. One hundred twenty eight disaster-related place terminologies are selected based on disaster causes and possibilities. Design of field structures and category codes for the disaster-related place names has been proposed and the construction of disaster-related place names from the six sources has been completed.

Influence analysis of Internet buzz to corporate performance : Individual stock price prediction using sentiment analysis of online news (온라인 언급이 기업 성과에 미치는 영향 분석 : 뉴스 감성분석을 통한 기업별 주가 예측)

  • Jeong, Ji Seon;Kim, Dong Sung;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.37-51
    • /
    • 2015
  • Due to the development of internet technology and the rapid increase of internet data, various studies are actively conducted on how to use and analyze internet data for various purposes. In particular, in recent years, a number of studies have been performed on the applications of text mining techniques in order to overcome the limitations of the current application of structured data. Especially, there are various studies on sentimental analysis to score opinions based on the distribution of polarity such as positivity or negativity of vocabularies or sentences of the texts in documents. As a part of such studies, this study tries to predict ups and downs of stock prices of companies by performing sentimental analysis on news contexts of the particular companies in the Internet. A variety of news on companies is produced online by different economic agents, and it is diffused quickly and accessed easily in the Internet. So, based on inefficient market hypothesis, we can expect that news information of an individual company can be used to predict the fluctuations of stock prices of the company if we apply proper data analysis techniques. However, as the areas of corporate management activity are different, an analysis considering characteristics of each company is required in the analysis of text data based on machine-learning. In addition, since the news including positive or negative information on certain companies have various impacts on other companies or industry fields, an analysis for the prediction of the stock price of each company is necessary. Therefore, this study attempted to predict changes in the stock prices of the individual companies that applied a sentimental analysis of the online news data. Accordingly, this study chose top company in KOSPI 200 as the subjects of the analysis, and collected and analyzed online news data by each company produced for two years on a representative domestic search portal service, Naver. In addition, considering the differences in the meanings of vocabularies for each of the certain economic subjects, it aims to improve performance by building up a lexicon for each individual company and applying that to an analysis. As a result of the analysis, the accuracy of the prediction by each company are different, and the prediction accurate rate turned out to be 56% on average. Comparing the accuracy of the prediction of stock prices on industry sectors, 'energy/chemical', 'consumer goods for living' and 'consumer discretionary' showed a relatively higher accuracy of the prediction of stock prices than other industries, while it was found that the sectors such as 'information technology' and 'shipbuilding/transportation' industry had lower accuracy of prediction. The number of the representative companies in each industry collected was five each, so it is somewhat difficult to generalize, but it could be confirmed that there was a difference in the accuracy of the prediction of stock prices depending on industry sectors. In addition, at the individual company level, the companies such as 'Kangwon Land', 'KT & G' and 'SK Innovation' showed a relatively higher prediction accuracy as compared to other companies, while it showed that the companies such as 'Young Poong', 'LG', 'Samsung Life Insurance', and 'Doosan' had a low prediction accuracy of less than 50%. In this paper, we performed an analysis of the share price performance relative to the prediction of individual companies through the vocabulary of pre-built company to take advantage of the online news information. In this paper, we aim to improve performance of the stock prices prediction, applying online news information, through the stock price prediction of individual companies. Based on this, in the future, it will be possible to find ways to increase the stock price prediction accuracy by complementing the problem of unnecessary words that are added to the sentiment dictionary.

A Study on the Color Grouping System to Fashion (섬유컬러 그루핑 체계에 관한 연구)

  • 이재정;정재우
    • Archives of design research
    • /
    • v.17 no.3
    • /
    • pp.27-38
    • /
    • 2004
  • It is important for designers to be supported with their decision-making on colours which is often based on personal distinction rather than logical dialogue that may lead to confusion within communicating with others. To help these problems and to gain productivity, we would like to propose a way to define colour grouping method. In other words, the purpose of this study is to help to improve the communication and productivity within the design and designers. The grouping was based and inspired by from the studies of Kobayashi, Hideaki Chijiawa, Allis Westgate and Martha Gill. The study of grouping is based on the "tones" of each group, as they seem to reflect a designer s sentimentalism of chosen colours the best. Each of these groups will be named Bright , Pastel ,Deep and Neutral The general concept of each groups are: - Bright: high quality of pixels of primary colour - Pastel: primary colour with white - - Deep: Primary colour with gray or black - Neutral: colours that does not include any of above Each of the colour group has been allocated into Si-Hwa Jung's colour charts and colour prism to visualize the relationships between the colour groups. These four groups and the colours included in them will be broken down to smaller groups in order to make colour palette. This would break the barrier and result in using colours in groups as well as crossover coordination. This study would result in new ways of using colurs for designers designers

  • PDF