• Title/Summary/Keyword: 문장 자질

Search Result 178, Processing Time 0.022 seconds

Competitor Extraction based on Machine Learning Methods (기계학습 기반 경쟁자 자동추출 방법)

  • Lee, Chung-Hee;Kim, Hyun-Jin;Ryu, Pum-Mo;Kim, Hyun-Ki;Seo, Young-Hoon
    • Annual Conference on Human and Language Technology
    • /
    • 2012.10a
    • /
    • pp.107-112
    • /
    • 2012
  • 본 논문은 일반 텍스트에 나타나는 경쟁 관계에 있는 고유명사들을 경쟁자로 자동 추출하는 방법에 대한 것으로, 규칙 기반 방법과 기계 학습 기반 방법을 모두 제안하고 비교하였다. 제안한 시스템은 뉴스 기사를 대상으로 하였고, 문장에 경쟁관계를 나타내는 명확한 정보가 있는 경우에만 추출하는 것을 목표로 하였다. 규칙기반 경쟁어 추출 시스템은 2개의 고유명사가 경쟁관계임을 나타내는 단서단어에 기반해서 경쟁어를 추출하는 시스템이며, 경쟁표현 단서단어는 620개가 수집되어 사용됐다. 기계학습 기반 경쟁어 추출시스템은 경쟁어 추출을 경쟁어 후보에 대한 경쟁여부의 바이너리 분류 문제로 접근하였다. 분류 알고리즘은 Support Vector Machines을 사용하였고, 경쟁어 주변 문맥 정보를 대표할 수 있는 언어 독립적 5개 자질에 기반해서 모델을 학습하였다. 성능평가를 위해서 이슈화되고 있는 핫키워드 54개에 대해서 623개의 경쟁어를 뉴스 기사로부터 수집해서 평가셋을 구축하였다. 비교 평가를 위해서 기준시스템으로 연관어에 기반해서 경쟁어를 추출하는 시스템을 구현하였고, Recall/Precision/F1 성능으로 0.119/0.214/0.153을 얻었다. 제안 시스템의 실험 결과로 규칙기반 시스템은 0.793/0.207/0.328 성능을 보였고, 기계 학습기반 시스템은 0.578/0.730/0.645 성능을 보였다. Recall 성능은 규칙기반 시스템이 0.793으로 가장 좋았고, 기준시스템에 비해서 67.4%의 성능 향상이 있었다. Precision과 F1 성능은 기계학습기반 시스템이 0.730과 0.645로 가장 좋았고, 기준시스템에 비해서 각각 61.6%, 49.2%의 성능향상이 있었다. 기준시스템에 비해서 제안한 시스템이 Recall, Precision, F1 성능이 모두 대폭적으로 향상되었으므로 제안한 방법이 효과적임을 알 수 있다.

  • PDF

Analysis of the Time-dependent Relation between TV Ratings and the Content of Microblogs (TV 시청률과 마이크로블로그 내용어와의 시간대별 관계 분석)

  • Choeh, Joon Yeon;Baek, Haedeuk;Choi, Jinho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.163-176
    • /
    • 2014
  • Social media is becoming the platform for users to communicate their activities, status, emotions, and experiences to other people. In recent years, microblogs, such as Twitter, have gained in popularity because of its ease of use, speed, and reach. Compared to a conventional web blog, a microblog lowers users' efforts and investment for content generation by recommending shorter posts. There has been a lot research into capturing the social phenomena and analyzing the chatter of microblogs. However, measuring television ratings has been given little attention so far. Currently, the most common method to measure TV ratings uses an electronic metering device installed in a small number of sampled households. Microblogs allow users to post short messages, share daily updates, and conveniently keep in touch. In a similar way, microblog users are interacting with each other while watching television or movies, or visiting a new place. In order to measure TV ratings, some features are significant during certain hours of the day, or days of the week, whereas these same features are meaningless during other time periods. Thus, the importance of features can change during the day, and a model capturing the time sensitive relevance is required to estimate TV ratings. Therefore, modeling time-related characteristics of features should be a key when measuring the TV ratings through microblogs. We show that capturing time-dependency of features in measuring TV ratings is vitally necessary for improving their accuracy. To explore the relationship between the content of microblogs and TV ratings, we collected Twitter data using the Get Search component of the Twitter REST API from January 2013 to October 2013. There are about 300 thousand posts in our data set for the experiment. After excluding data such as adverting or promoted tweets, we selected 149 thousand tweets for analysis. The number of tweets reaches its maximum level on the broadcasting day and increases rapidly around the broadcasting time. This result is stems from the characteristics of the public channel, which broadcasts the program at the predetermined time. From our analysis, we find that count-based features such as the number of tweets or retweets have a low correlation with TV ratings. This result implies that a simple tweet rate does not reflect the satisfaction or response to the TV programs. Content-based features extracted from the content of tweets have a relatively high correlation with TV ratings. Further, some emoticons or newly coined words that are not tagged in the morpheme extraction process have a strong relationship with TV ratings. We find that there is a time-dependency in the correlation of features between the before and after broadcasting time. Since the TV program is broadcast at the predetermined time regularly, users post tweets expressing their expectation for the program or disappointment over not being able to watch the program. The highly correlated features before the broadcast are different from the features after broadcasting. This result explains that the relevance of words with TV programs can change according to the time of the tweets. Among the 336 words that fulfill the minimum requirements for candidate features, 145 words have the highest correlation before the broadcasting time, whereas 68 words reach the highest correlation after broadcasting. Interestingly, some words that express the impossibility of watching the program show a high relevance, despite containing a negative meaning. Understanding the time-dependency of features can be helpful in improving the accuracy of TV ratings measurement. This research contributes a basis to estimate the response to or satisfaction with the broadcasted programs using the time dependency of words in Twitter chatter. More research is needed to refine the methodology for predicting or measuring TV ratings.

An Attention Method-based Deep Learning Encoder for the Sentiment Classification of Documents (문서의 감정 분류를 위한 주목 방법 기반의 딥러닝 인코더)

  • Kwon, Sunjae;Kim, Juae;Kang, Sangwoo;Seo, Jungyun
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.4
    • /
    • pp.268-273
    • /
    • 2017
  • Recently, deep learning encoder-based approach has been actively applied in the field of sentiment classification. However, Long Short-Term Memory network deep learning encoder, the commonly used architecture, lacks the quality of vector representation when the length of the documents is prolonged. In this study, for effective classification of the sentiment documents, we suggest the use of attention method-based deep learning encoder that generates document vector representation by weighted sum of the outputs of Long Short-Term Memory network based on importance. In addition, we propose methods to modify the attention method-based deep learning encoder to suit the sentiment classification field, which consist of a part that is to applied to window attention method and an attention weight adjustment part. In the window attention method part, the weights are obtained in the window units to effectively recognize feeling features that consist of more than one word. In the attention weight adjustment part, the learned weights are smoothened. Experimental results revealed that the performance of the proposed method outperformed Long Short-Term Memory network encoder, showing 89.67% in accuracy criteria.

Recognition Method of Korean Abnormal Language for Spam Mail Filtering (스팸메일 필터링을 위한 한글 변칙어 인식 방법)

  • Ahn, Hee-Kook;Han, Uk-Pyo;Shin, Seung-Ho;Yang, Dong-Il;Roh, Hee-Young
    • Journal of Advanced Navigation Technology
    • /
    • v.15 no.2
    • /
    • pp.287-297
    • /
    • 2011
  • As electronic mails are being widely used for facility and speedness of information communication, as the amount of spam mails which have malice and advertisement increase and cause lots of social and economic problem. A number of approaches have been proposed to alleviate the impact of spam. These approaches can be categorized into pre-acceptance and post-acceptance methods. Post-acceptance methods include bayesian filters, collaborative filtering and e-mail prioritization which are based on words or sentances. But, spammers are changing those characteristics and sending to avoid filtering system. In the case of Korean, the abnormal usages can be much more than other languages because syllable is composed of chosung, jungsung, and jongsung. Existing formal expressions and learning algorithms have the limits to meet with those changes promptly and efficiently. So, we present an methods for recognizing Korean abnormal language(Koral) to improve accuracy and efficiency of filtering system. The method is based on syllabic than word and Smith-waterman algorithm. Through the experiment on filter keyword and e-mail extracted from mail server, we confirmed that Koral is recognized exactly according to similarity level. The required time and space costs are within the permitted limit.

Molecular biological studies on Heat-Shock Responses in Amoeba proteus: I. Detection of Heat-shock Proteins (아메바(Amoebaproteus)의 열충격 대응에 관한 분자생물학적 연구: 1 . 열충격 대응 단백질의 탐색)

  • 홍혜경;최지영안태인
    • The Korean Journal of Zoology
    • /
    • v.37 no.4
    • /
    • pp.554-564
    • /
    • 1994
  • 세균이 세포내 공생하는 xD strain과 모 세포주인 tD strain Amoeba proteus의 열충격 대응의 차이를 알아 보기 위하여 방사선 동위원소로 표지된 아미노산을 Ca2+_less Chalkley's 용액에서 음작용 경로를 통하여 90분 동안 흡수하게 하고, 저온 및 고온 스트레스에 대하여 새로 합성되는 스트레스 대응 단백질의 양상을 1, 2차원 전기영동 및 자기방사 사진법에 의해서 비교하였다 저온(10"C) 충격에 대응하여 아메바는 두 strain 모두 56.0 kDa, pl 6.0 단백질을 강하게 발현하였으며, xD strain에서는 tD strain과 달리 저온 충격 초기에 66 0 kDa, pl 5.5 단백질의 발현이 중단되었다. 한편 고온(33"C) 열충격에 대하여 두 strain 아메바에서 모두 10여종의 단백질이 새합성되는 것으로 확인되었으며, tD 아메바에는 이들 단백질의 새합성이 완만하게 이루어지는데 비하여 xD 아메바에서는 그중 66.0 kDa 단백질이 고온 대응 단백질로서 신속하게 새합성되는 것으로 나타났다. 이외에도 2차원 전기 영동 분석을 통하여 열충격에 의해서 발현이 촉진되는 다수의 단백질들을 탐지하였다 탐지된 아메바의 열충격 단백질은 분자량에 따라 hsp100군 2종, hsp90군, 3종, hsp70군 및 hsp60군 각 1종, 그리고 small csp군 4종으로 분류해 볼 수 있었다 두 분석의 결과를 종합해 보면 tD 아메바에는 저온 및 고온 충격에 대하여 열충격 단백질의 합성이 완만하게 상승하는 데 비하여 xD strain에서는 신속하게 이루어졌다. 이상의 결과로 보아 아메바의 세포내 공생 세균은 숙주의 열충격 대응기작에 변화를 야기한 것으로 판단된다한 것으로 판단된다. 10mg과 20mg의 estrogen 처리구 사이에 유두 직경, 길이 그리고 용적의 증가량에 있어서는 차이가 없었다. 10mg 및 20mg의 estrogen 처리는 초발정일령을 각각 20일 및 124일 단축시켰다. 전체적으로 이러한 결과는 송아지에 estradiol의 삽입은 성장과 유선 발달을 촉진시키고 초발정일령을 단축시킬수 있다는 것을 강력하게 지적한다. 일치하지 않으므로 더욱 정밀한 조사를 실시하여 분류학상의 위치를 정확히 밝혀 볼 필요가 있을 것으로 생각되었다.연한 도구이자 정신활동으로 보게함으로써, 주제 및 연구방법에서 획일성보다 다양성과 창조성이 강조되고 있다. 그리고 연구에 있어서 주제 의 다양성을 통해 보다 현실생활에 밀접하게 연결되어야 할 필요성은 학문이나 과학의 사회 성에 대한 새로운 인식을 가져다 주고 있다. 이러한 지리교육과정의 좌표의 변화된 측면들 을 고려하여, 지리교육과정의 새로운 방향은 다음의 세가지로 모색될 수 있다. 첫째, 爭點中 心 地理敎育課程이다. 사회쟁점에 대한 접근은 쟁점의 이해와 문제해결에의 지리적 관점의 활용을 통해 학습내용의 시사성과 사실성을 높힐 수 있다. 이때 문제해결능력을 통해 현대 시민의 자질 및 능력을 기를 수 있음은 물론, 다른 한편으로 실제세계 즉 학생의 실생활, 사 회, 국가, 세계에서 일어나는 일들과의 관련성을 갖게 함으로써, 내적 동기화와 외적인 자극 을 강력하게 결합할 수 있을 것이다. 이는 개인적 유관적합성과 사회적 유관적합성을 동시 에 확보하는 데 유리할 것이다. 둘째, 思考中心 地理敎育課程이다. 지리교육은 학생들을 지 식 및 기능의 숙달자가 되도록 할 것이 아니라 기본적 문장해독력의 수준을 넘어 능력있는 사고자로 길러내는 것을 목표로 하여야 한다.

  • PDF

Using Film Music for Second Language, Target Culture, and Ethics Education: With Reference to the OST of The Lion King (제 2언어, 문화 및 윤리 교육 자료로서의 영화 음악 활용: 라이온 킹 OST를 중심으로)

  • Kim, Hye-Jeong
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.5
    • /
    • pp.509-519
    • /
    • 2017
  • This study addresses the effective utilization of film music as learning material for language, target culture, and ethics education. Music is intertwined with language and culture, and even with ethics. This study focuses on the potential power of film music in the processes of teaching and learning in a classroom. For this purpose, five songs are selected from the soundtrack of Disney's famous animation The Lion King: "Circle of life", "I just can't wait to be king", "Be prepared", "Hakuna Matata", and "Can you feel the love tonight?", and concrete learning activities are suggested based on these. Using these five songs, gap-filling and singing-recoding tasks are proposed as listening and speaking activities respectively. Film music is also very useful in learning vocabulary, sentence structure, and grammar. Learners participate in a writing activity involving creating their own lyrics for the tunes reflecting their experiences. Next, for culture education, a teacher asks their students to discuss about, and be aware of, food culture using a specific character's song. Finally, for ethics education, a philosophy of life, natural logic, leadership qualities, and the motto Hakuna Matata("no worries") are explored and discussed through an analysis of the lyrics. The open-ended questionnaire survey is conducted. The result shows that music has a positive effect on culture and ethics education. Film music can be effective in learning a second language, target culture, and ethics.

A Study of Relationship Derivation Technique using object extraction Technique (개체추출기법을 이용한 관계성 도출기법)

  • Kim, Jong-hee;Lee, Eun-seok;Kim, Jeong-su;Park, Jong-kook;Kim, Jong-bae
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.309-311
    • /
    • 2014
  • Despite increasing demands for big data application based on the analysis of scattered unstructured data, few relevant studies have been reported. Accordingly, the present study suggests a technique enabling a sentence-based semantic analysis by extracting objects from collected web information and automatically analyzing the relationships between such objects with collective intelligence and language processing technology. To be specific, collected information is stored in DBMS in a structured form, and then morpheme and feature information is analyzed. Obtained morphemes are classified into objects of interest, marginal objects and objects of non-interest. Then, with an inter-object attribute recognition technique, the relationships between objects are analyzed in terms of the degree, scope and nature of such relationships. As a result, the analysis of relevance between the information was based on certain keywords and used an inter-object relationship extraction technique that can determine positivity and negativity. Also, the present study suggested a method to design a system fit for real-time large-capacity processing and applicable to high value-added services.

  • PDF

Recontextualizing geography curriculum:society;student and discipline of geography (地理 敎育課程의 再脈絡化)

  • Seo, Tae Yeol
    • Journal of the Korean Geographical Society
    • /
    • v.29 no.4
    • /
    • pp.438-449
    • /
    • 1994
  • This paper focuses on recontextualizing geography curriculum, i.e. examining recent changing aspects in three geography curriculum locators-society, student and discipline of geography-and searching future directions of geography curriculum in light of such changes. For conciliation and reflection of changing aspects of each locators, this paper dealt with social issues and societal changes in terms of locator of society, increased concern to student and development of cognitive science in terms of students, and challenging views on science and the meaning of epistemological changes in geography in terms of discipline. As a result, three future directions in geography curriculum are searched : issue-based geography curriculum, thinking geography curriculum, geography curriculum toward equity and accessbility.

  • PDF