• 제목/요약/키워드: word-form

검색결과 381건 처리시간 0.023초

CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로 (Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding)

  • 박현정;송민채;신경식
    • 지능정보연구
    • /
    • 제24권2호
    • /
    • pp.59-83
    • /
    • 2018
  • 고객과 대중의 니즈를 파악하기 위한 감성분석의 중요성이 커지면서 최근 영어 텍스트를 대상으로 다양한 딥러닝 모델들이 소개되고 있다. 본 연구는 영어와 한국어의 언어적인 차이에 주목하여 딥러닝 모델을 한국어 상품평 텍스트의 감성분석에 적용할 때 부딪히게 되는 기본적인 이슈들에 대하여 실증적으로 살펴본다. 즉, 딥러닝 모델의 입력으로 사용되는 단어 벡터(word vector)를 형태소 수준에서 도출하고, 여러 형태소 벡터(morpheme vector) 도출 대안에 따라 감성분석의 정확도가 어떻게 달라지는지를 비정태적(non-static) CNN(Convolutional Neural Network) 모델을 사용하여 검증한다. 형태소 벡터 도출 대안은 CBOW(Continuous Bag-Of-Words)를 기본적으로 적용하고, 입력 데이터의 종류, 문장 분리와 맞춤법 및 띄어쓰기 교정, 품사 선택, 품사 태그 부착, 고려 형태소의 최소 빈도수 등과 같은 기준에 따라 달라진다. 형태소 벡터 도출 시, 문법 준수도가 낮더라도 감성분석 대상과 같은 도메인의 텍스트를 사용하고, 문장 분리 외에 맞춤법 및 띄어쓰기 전처리를 하며, 분석불능 범주를 포함한 모든 품사를 고려할 때 감성분석의 분류 정확도가 향상되는 결과를 얻었다. 동음이의어 비율이 높은 한국어 특성 때문에 고려한 품사 태그 부착 방안과 포함할 형태소에 대한 최소 빈도수 기준은 뚜렷한 영향이 없는 것으로 나타났다.

Wh-movement in the L2 Learner's Initial Syntax

  • Kim, Jung-Tae
    • 영어어문교육
    • /
    • 제10권2호
    • /
    • pp.1-23
    • /
    • 2004
  • This article reports a bi-directional interlanguage study designed to investigate the initial state of L2 acquisition with regard to English and Korean wh-questions. Based on the UG system in line with the minimalist theory, it was hypothesized that the L2 initial state is characterized by the most economical form of syntax in which no overt wh-movement to Spec-CP is assumed. Results of the early interlanguage study showed that 1) L1 Korean learners of L2 English predominantly produced wh-questions with the fronted wh-word, but without productive wh-movement to the Spec-CP position; and 2) L1 English learners of L2 Korean overwhelmingly produced wh-questions with the wh-word remaining in-situ. These results were interpreted as supporting the minimalist account of the L2 initial grammar in that no overt syntactic wh-movement were adopted in early interlanguages of both English and Korean regardless of the learner's L1.

  • PDF

A Study for Success Factors in On-line Games

  • Jung, Jai-Jin
    • 한국멀티미디어학회논문지
    • /
    • 제9권12호
    • /
    • pp.1657-1668
    • /
    • 2006
  • The last few years have represented a boom for the online gaming industry. Internet-based online games have been an increasingly popular form of entertainment. The gaming industry estimates there will be over 26 million online gaming participants in 2002. The rapid development of online game content and related information technology will increase the size of the industry and have a profound impact on many aspects of our lives and our society. This paper develops the exploratory LISREL model for identifying the factors affecting the players' loyalty to a specific brand of online game. The concepts of flow, word of mouth, feedback, challenge, social norms, and online community activities, etc, are all introduced into the model, as the independent variables directly and indirectly affecting loyalty. Based on data collected from an online survey, the validity of the model has been tested and interesting conclusions have been developed concerning the relationships between loyalty and flow, word of mouth, and other independent variables. It is hoped that this result might provide useful guidelines for developing successful online game content.

  • PDF

THE FRACTIONAL TOTIENT FUNCTION AND STURMIAN DIRICHLET SERIES

  • Kwon, DoYong
    • 호남수학학술지
    • /
    • 제39권2호
    • /
    • pp.297-305
    • /
    • 2017
  • Let ${\alpha}$ > 0 be a real number and $(s_{\alpha}(n))_{n{\geq}1}$ be the lexicographically greatest Sturmian word of slope ${\alpha}$. We investigate Dirichlet series of the form ${\sum}^{\infty}_{n=1}s_{\alpha}(n)n^{-s}$. To do this, a generalization of Euler's totient function is required. For a real ${\alpha}$ > 0 and a positive integer n, an arithmetic function ${\varphi}{\alpha}(n)$ is defined to be the number of positive integers m for which gcd(m, n) = 1 and 0 < m/n < ${\alpha}$. Under a condition Re(s) > 1, this paper establishes an identity ${\sum}^{\infty}_{n=1}s_{\alpha}(n)n^{-S}=1+{\sum}^{\infty}_{n=1}{\varphi}_{\alpha}(n)({\zeta}(s)-{\zeta}(s,1+n^{-1}))n^{-s}$.

영어 중첩복합어 분석 (An Analysis of English Reduplicative compounds)

  • 김형엽
    • 인문언어
    • /
    • 제2권1호
    • /
    • pp.303-314
    • /
    • 2002
  • The main purpose of this paper is to show how Jespersen analyzed the date of English compound related with reduplication. Especially dealing with the compound words he classified the examples related with reduplication as a separate part and attempted to account for the patters based on the structure of the first syllable constituting the initial part of the second element in a compound word. 1 tried to explain the peculiar shape of the reduplicational pattern in English based on the Optimality Theory, especially the method of 'melodic overwriting' of McCarthy(1997). According to the analysis the initial part of the second element of a compound has to be stipulated before reduplication occurs. When the reduplicant has to be decided at the first syllable of the second element, the form which is stipulated to take the position comes to appear at the post instead of repeating the morphemic shape of the first syllable at the first element of the word.

  • PDF

다중 관측열을 토대로한 HMM에 의한 음성 인식에 관한 연구 (A study on the speech recognition by HMM based on multi-observation sequence)

  • 정의봉
    • 전자공학회논문지S
    • /
    • 제34S권4호
    • /
    • pp.57-65
    • /
    • 1997
  • The purpose of this paper is to propose the HMM (hidden markov model) based on multi-observation sequence for the isolated word recognition. The proosed model generates the codebook of MSVQ by dividing each word into several sections followed by dividing training data into several sections. Then, we are to obtain the sequential value of multi-observation per each section by weighting the vectors of distance form lower values to higher ones. Thereafter, this the sequential with high probability value while in recognition. 146 DDD area names are selected as the vocabularies for the target recognition, and 10LPC cepstrum coefficients are used as the feature parameters. Besides the speech recognition experiments by way of the proposed model, for the comparison with it, the experiments by DP, MSVQ, and genral HMM are made with the same data under the same condition. The experiment results have shown that HMM based on multi-observation sequence proposed in this paper is proved superior to any other methods such as the ones using DP, MSVQ and general HMM models in recognition rate and time.

  • PDF

An improved spectrum mapping applied to speaker adaptive Kroean word recognition

  • Matsumoto, Hiroshi;Lee, Yong-Ju;Kim, Hoi-Rim;Kido, Ken'iti
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1994년도 FIFTH WESTERN PACIFIC REGIONAL ACOUSTICS CONFERENCE SEOUL KOREA
    • /
    • pp.1009-1014
    • /
    • 1994
  • This paper improves the previously proposed spectral mapping method for supervised speaker adaptation in which a mapped spectrum is interpolated from speaker difference vectors at typical spectra based on a minimized distortion criterion. In estimating these difference vectors, it is important to find an appropriate number of typical points. The previous method empirically adjusts the number of typical points, while the present method optimizes the effective number by rank reduction of normal equation. This algorithm was applied to a supervised speaker adaptation for Korean word recognition using the templates form a prototype male speaker. The result showed that the rank reduction technique not only can automatically determine an optimal number of code vectors, but also slightly improves the recognition scores compared with those obtained by the previous method.

  • PDF

Influence Maximization Scheme against Various Social Adversaries

  • Noh, Giseop;Oh, Hayoung;Lee, Jaehoon
    • Journal of information and communication convergence engineering
    • /
    • 제16권4호
    • /
    • pp.213-220
    • /
    • 2018
  • With the exponential developments of social network, their fundamental role as a medium to spread information, ideas, and influence has gained importance. It can be expressed by the relationships and interactions within a group of individuals. Therefore, some models and researches from various domains have been in response to the influence maximization problem for the effects of "word of mouth" of new products. For example, in reality, more than two related social groups such as commercial companies and service providers exist within the same market issue. Under such a scenario, they called social adversaries competitively try to occupy their market influence against each other. To address the influence maximization (IM) problem between them, we propose a novel IM problem for social adversarial players (IM-SA) which are exploiting the social network attributes to infer the unknown adversary's network configuration. We sophisticatedly define mathematical closed form to demonstrate that the proposed scheme can have a near-optimal solution for a player.

한국어사에서 20세기 초 한국어의 위상과 문법 특징 (Historic Status and Grammatical Characteristics of Korean language in the Early 20th Century)

  • 홍종선
    • 한국어학
    • /
    • 제71권
    • /
    • pp.1-22
    • /
    • 2016
  • The early 20th century is a period of time when Korea confronted with the surging waves of modernization, and made a variety of internal reactions. The Korean language, not immune to the upheaval, also experienced new changes and gradually gained characteristics of today's Korean. Although scholars have not yet fully agreed upon the time division of Korean, Gabo reformation (1896) is usually considered to be the beginning of modern Korean. Thus, the early 20th century was also the beginning of modern Korean. Phonological, lexical, and grammatical characteristics of modern day Korean began to appear during this period of time. Phonologically, the 10 vowel system was established, glottal sounds and aspirated sounds increased, vowel harmony declined. Phenomena such as vowel raising, front-vowelization, monophthongization, and the word-initial rule appeared. Meanwhile, hangul-Chinese mix writing became common practice, and hangul-only writing also started to take place in narrative writing, and elements of spoken language began to reflect in written language. All those pointed to the unification of written and spoken language. Under the influence of modernization, a great amount of new words appeared. Especially, Japanese and other foreign words flooded in in great quantities. Grammatically, '-eos-(-엇-), -neun-(-는-), -ges-(-겟-)' trichotomy system of tenses was established, and hearer-oriented honorific system also formed a binary system of 'hasoseo(하소서), hasibsio(하십시오), hao(하오), hage(하게), haera(해라)' and 'hae (해), haeyo(해요)'. In word formation and sentence construction, the use of '-gi(-기)' became more frequent than '-eum(-음)', while '~geot(~것)' also significantly increased. In negative, causative and passive expressions, the use of long form, which has fewer restrictions than the short form, became more frequent. A tendency towards simplicity appeared. In the same vain, long and complex sentences with several clauses tend to be avoided. Instead, short simple sentences became more favorable. Korean linguistics scholars should pay closer attention to the modernization period, which includes the early 20th century. In order to fully understand today's Korean language, more thorough research on this immediately preceding period is necessary.

워드문서 콘텐츠의 사용자 XML 콘텐츠로의 변환 및 저장 시스템 개발 (Rule Based Document Conversion and Information Extraction on the Word Document)

  • 주원균;양명석;김태현;이민호;최기석
    • 한국콘텐츠학회:학술대회논문집
    • /
    • 한국콘텐츠학회 2006년도 추계 종합학술대회 논문집
    • /
    • pp.555-559
    • /
    • 2006
  • 본 논문은 HWP, DOC와 같은 워드 문서를 대상으로 사용자가 작성한 구조적인 규칙과 XML 기반 워드 문서 변환 기법을 이용함으로써, 사용자의 관심 영역에 해당하는 다양한 형태(표, 리스트 등)의 정보를 효과적으로 추출(변환)하여 저장하기 위한 방법에 관한 것이다. 본 논문에서 제시한 시스템은 3가지의 중요한 요소들로 구성되어 있는데, 1)워드문서의 원시 XML문서로의 변환방법, 2)XML 기반 구조적인 규칙 작성과 규칙을 이용하여 원시 XML 문서에서 정보를 추출(변환)하는 방법, 3)추출 된 정보에서 최종 XML을 생성하거나 DB에 저장하는 방법이 그것이다. 워드문서의 변환을 위해서 독립적으로 동작하는 OCX 기반의 워드문서 변환 데몬(daemon)을 개발하였고, 사용자의 정보 추출(변환)과정을 돕기 위해서 XSLT를 확장한 형태의 스크립트 언어를 개발하였다. 스크립트 언어는 비교적 간단한 문법 구조를 가지고 있고, 데이터 처리를 위한 자체 정의 함수와 변수를 사용한다. 추출된 정보는 원하는 형태의 구조적인 문서로 생성하거나 DB에 저장할 수 있다. 개발한 시스템(PPE)은 워드 문서 원문 정보에 대한 데이터베이스 구축 및 서비스의 제공, 혹은 구축된 데이터베이스를 이용하여 다양한 처리를 하거나 현황 통계를 제공하는 분야에서 유용하게 사용할 수 있다. 실제로 연구과제관리 시스템과 성과정보시스템에 시범 적용하였다.

  • PDF