• 제목/요약/키워드: lexicons

검색결과 41건 처리시간 0.021초

LR 테크닉을 이용한 형태소 분석 (Morphological Processing with LR Techniques)

  • 이강혁
    • 인지과학
    • /
    • 제4권2호
    • /
    • pp.115-143
    • /
    • 1994
  • 본 논문은 LR 파싱기법을 이용한 확장된 두단계(two-level)형태소분석 모델을 제시한다.LA기법을 이용한 두단계 모델은 효율적 형태소분석 뿐만 아니라 Koskenniemi(1983)의 모델보다 형태론적 현상에 대한 보다 높은 기술성(descriptive adequacy)을 획득한다.이를 위해 두단계 모델은 자질기반의 문맥자유문법(feature-based CF grammar)에 근거한 독립적인 형태/통사모듈에 의해 확장된다.문맥자유문법에 근거한 단어문법(word grammar)을 채택함으로써 확장 모델은 하위사전의 중복현상을 피하면서 비연속적 의존관계(discontinuous dependencies) 를 가지는 복합어 등을 처리할 수 있다.또한 파싱테이블에 명시된 LR 예측은 형태소분석기로 하여금 사전탐색시간을 줄일 수 있도록 도와준다.

Lexical and Semantic Incongruities between the Lexicons of English and Korean

  • Lee, Yae-Sheik
    • 한국언어정보학회지:언어와정보
    • /
    • 제5권2호
    • /
    • pp.21-37
    • /
    • 2001
  • Pustejovsky (1995) rekindled debate on the dual problems of how to represent lexical meaning and on the information that is to be encoded in a lexicon. For natural language processing such as machine translation, these are important issues. When a lexical-conceptual mismatch occurs in translation of corresponding words from two different languages, the appropriate representation of their meanings is very important. This paper proposes a new formalism for representing lexical entries by first analysing observable mismatches in comparable pairs of nouns, verbs, and adjectives in English and Korean. Inherent mis-interpretations and mis-readings in each pair are identified. Then, concept theories such as those presented by Ganter and Wille (1996) and Priss (1998) are extended in order to reflect the cognitivist view that meaning resides in concept, and also to incorporate the propositions of the so-called ‘multiple inheritance’system. An alternative to the formalism of Pustejovsky (1995) and Pollard & Sag (1994) is then proposed. Finally, representative examples of lexical mismatches are analysed using the new model.

  • PDF

등급 재현율: 이중언어 사전 구축에 대한 평가 방법 (Rated Recall: Evaluation Method for Constructing Bilingual Lexicons)

  • 서형원;권홍석;김재훈
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2013년도 제25회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.146-151
    • /
    • 2013
  • 이중언어 사전 구축 방법을 평가하는 방법에는 정확률, 재현율, MRR(Mean Reciprocal Rank) 등이 있다. 이들 방법들은 평가 집합에 있는 대역어를 정확하게 찾는 것에 초점을 맞추고 있다. 그러나 어떤 대역어가 얼마나 많이 사용되는지는 전혀 고려하지 않는다. 즉 자주 사용되는 대역어를 빨리 찾을 수 있는 방법이 좋은 방법이라고 말할 수 있다. 이와 같은 문제를 해결하기 위해서 본 논문에서는 이중언어 사전 구축의 새로운 평가 방법인 등급 재현율을 제안한다. 등급 재현율(rated recall)은 대역어가 학습 말뭉치에 나타난 정도를 반영하는 재현율이며, 자주 사용되는 대역어를 얼마나 정확하게 찾는지를 파악할 수 있는 좋은 측도이다. 본 논문에서는 문맥벡터와 중간언어를 이용한 이중언어 사전 구축 시스템의 성능을 평가하고 기존의 방법과 비교 분석하였다.

  • PDF

유통분야 전문용어 사용실태 조사를 통한 용어 표준화 연구 (Study on the Standardization of Korean Distribution Terminology through its Usage Survey)

  • 한규철;이상윤
    • 유통과학연구
    • /
    • 제13권4호
    • /
    • pp.77-87
    • /
    • 2015
  • Purpose - This study aims to investigate the current state of distribution terminology usage by retailers and consumers nationwide, and to suggest a practical improvement plan for its standardization. The Korean distribution industry is closely related to consumers' daily lives. However, in reality, there exists a gap among producers, distributors, and consumers in terms of the definition, understanding, and perception of the terminology. Therefore, standardizing this terminology is essential for more smooth communication. This paper suggests the necessity of committing overall research and survey activities to the actual conditions of using Korean distribution terminology by organizations and their respective management situations, and further, the necessity of probing the problem and its measures in line with the objective and mission of the "Fundamental Law of the Korean Language." Research design, data, and methodology - This study's scope is limited to wholesale and retail including some information systems. First, the study covers most written material including lexicons and glossary of distribution terminology, university textbooks and teaching material for national certificate of qualification, and related laws and ordinances. Second, the survey covers retailers' management situations by store format. The retailers used as the sample for the survey include department stores, discount stores, SSM, and convenience stores. Altogether, 20 specialists were interviewed in their respective sectors or retail formats. Finally, the project team surveyed a sample of 1,300 consumers nationwide on 50 distribution terms mainly used by consumers, including those about awareness, understanding, usage, and attitude. Results - In total, 1,249 terms are drawn through literature research including distribution terminology used in the related literature, glossary and lexicons, distribution terminology in textbooks, and legal terminology. A classified table comprises four large categories including general distribution, distribution marketing, distribution information, and merchandise. The results of the three-step research including literature survey, field survey of retailers, and consumer survey were advised to be screened by academia (retail associations, faculty etc.), retailers (major retail management by store format), retail specialists and consultants, consumers, and Korean linguists. In total, 1,300 questionnaires for 50 terms of the distribution terminology closely associated with consumers were distributed to subjects nationwide. Conclusions - The desired and expected results from this study are summarized from three perspectives as follows: First, from retailers' perspective, a new concept, or coinage of new terms of the distribution industry stems from advanced countries such as America and Europe. However, the original meaning and definition are diluted and distorted with changes in the language users' situations and context. This study provides basic guidelines for standardization of distribution terms used among various retail formats in most daily life situations that consumers encounter. Second, from the nation's perspective, this study suggests optimal choices of distribution terminology in the context of laws and ordinances regarding concerned Ministries. Last, from the consumers' perspective, this paper enables consumers to understand and use distribution terms properly in their daily life.

$\cdot$ 영 동시조음 데이터베이스의 구축 (Speech Coarticulation Database of Korean and English)

  • 김종미
    • 한국음향학회지
    • /
    • 제18권3호
    • /
    • pp.17-26
    • /
    • 1999
  • We present the first speech coarticulation database of Korean, English and Konglish/sup 3)/ named "SORIDA"/sup 4)/, which is designed to cover the maximum number of representations of coarticulation in these languages [1]. SORIDA features a compact database which is designed to contain a maximum number of triphones in a minimum number of prompts. SORIDA contains all consonantal triphones and vowel allophones in 682 Korean prompts of word length and in 717 English prompt words, spoken five times by speakers of balanced genders, dialects and ages. Korean prompts are synthesized lexicons which maximize their coarticulation variation disregarding any stress phenomena, while English prompts are natural words that fully reflect their stress effects with respect to the coarticulation variation. The prompts are designed differently because English phonology has stress while Korean does not. An intermediate language, Konglish has also been modeled by two Korean speakers reading 717 English prompt words. Recording was done in a controlled laboratory environment with an AKG Model C-100 microphone and a Fostex D-5 digital-audio-tape (DAT) recorder. The total recording time lasted four hours. SORIDA CD-ROM is available in one disk of 22.05 kHz sampling rate with a 16 bit sample size. SORIDA digital audio-tapes are available in four 124-minute-tapes of 48 kHz sampling rate. SORIDA′s list of phonetically-rich-words is also available in English and Korean.

  • PDF

인스타그램 해시태그를 이용한 사용자 감정 분류 방법 (A Method for User Sentiment Classification using Instagram Hashtags)

  • 남민지;이은지;신주현
    • 한국멀티미디어학회논문지
    • /
    • 제18권11호
    • /
    • pp.1391-1399
    • /
    • 2015
  • In recent times, studies sentiment analysis are being actively conducted by implementing natural language processing technologies for analyzing subjective data such as opinions and attitudes of users expressed on the Web, blogs, and social networking services (SNSs). Conventionally, to classify the sentiments in texts, most studies determine positive/negative/neutral sentiments by assigning polarity values for sentiment vocabulary using sentiment lexicons. However, in this study, sentiments are classified based on Thayer's model, which is psychologically defined, unlike the polarity classification used in opinion mining. In this paper, as a method for classifying the sentiments, sentiment categories are proposed by extracting sentiment keywords for major sentiments by using hashtags, which are essential elements of Instagram. By applying sentiment categories to user posts, sentiments can be determined through the similarity measurement between the sentiment adjective candidates and the sentiment keywords. The test results of the proposed method show that the average accuracy rate for all the sentiment categories was 90.7%, which indicates good performance. If a sentiment classification system with a large capacity is prepared using the proposed method, then it is expected that sentiment analysis in various fields will be possible, such as for determining social phenomena through SNS.

Extended pivot-based approach for bilingual lexicon extraction

  • Seo, Hyeong-Won;Kwon, Hong-Seok;Kim, Jae-Hoon
    • Journal of Advanced Marine Engineering and Technology
    • /
    • 제38권5호
    • /
    • pp.557-565
    • /
    • 2014
  • This paper describes the extended pivot-based approach for bilingual lexicon extraction. The basic features of the approach can be described as follows: First, the approach builds context vectors between a source (or target) language and a pivot language like English, respectively. This is the same as the standard pivot-based approach which is useful for extracting bilingual lexicons between low-resource languages such as Korean-French. Second, unlike the standard pivot-based approach, the approach looks for similar context vectors in a source language. This is helpful to extract translation candidates for polysemous words as well as lets the translations be more confident. Third, the approach extracts translation candidates from target context vectors through the similarity between source and target context vectors. Based on these features, this paper describes the extended pivot-based approach and does various experiments in a language pair, Korean-French (KR-FR). We have observed that the approach is useful for extracting the most proper translation candidate as well as for a low-resource language pair.

Evaluating English Loanwords and Their Usage for Professional Translation, Focusing on News Texts

  • Bokyung Noh
    • International Journal of Advanced Culture Technology
    • /
    • 제12권2호
    • /
    • pp.161-166
    • /
    • 2024
  • As globalization has accelerated, the use of English loanwords is increasing in South Korea. In this paper, we have analyzed news stories from four Korean quality newspapers-Chosun Ilbo, Dong-A Ilbo, KyungHyang Sinmun, and Chung-Ang Ilbo to investigate the usage of English loanwords in news texts. Thirty-eight news stories on life, politics, business and IT were collected from the four newspapers and then analyzed based on the five types of loanwords-Direct, Mixed Code Combination, Clipping and Neologism and Double Notation, partly following Lee's and Rudiger's classification. As a result, the followings were revealed: first, the use of the category Direct was overwhelming the others with 90%, indicating that English loanwords were not translated from its source language and introduced into Korean directly with little modification; second, the use of English loanwords was significantly higher in the sections of business and IT than in other sectors, implying that English loanwords function in a similar way as a lingua franca does within those fields. Furthermore, the linguistic trends can provide a basic guide for translators to make an informed decision between the use of English loanwords and its translated Korean version in English-into Korean translation.

Bi-LSTM 기반의 한국어 감성사전 구축 방안 (KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon)

  • 박상민;나철원;최민성;이다희;온병원
    • 지능정보연구
    • /
    • 제24권4호
    • /
    • pp.219-240
    • /
    • 2018
  • 감성사전은 감성 어휘에 대한 사전으로 감성 분석(Sentiment Analysis)을 위한 기초 자료로 활용된다. 이와 같은 감성사전을 구성하는 감성 어휘는 특정 도메인에 따라 감성의 종류나 정도가 달라질 수 있다. 예를 들면, '슬프다'라는 감성 어휘는 일반적으로 부정의 의미를 나타내지만 영화 도메인에 적용되었을 경우 부정의 의미를 나타내지 않는다. 그렇기 때문에 정확한 감성 분석을 수행하기 위해서는 특정 도메인에 알맞은 감성사전을 구축하는 것이 중요하다. 최근 특정 도메인에 알맞은 감성사전을 구축하기 위해 범용 감성 사전인 오픈한글, SentiWordNet 등을 활용한 연구가 진행되어 왔으나 오픈한글은 현재 서비스가 종료되어 활용이 불가능하며, SentiWordNet은 번역 간에 한국 감성 어휘들의 특징이 잘 반영되지 않는다는 문제점으로 인해 특정 도메인의 감성사전 구축을 위한 기초 자료로써 제약이 존재한다. 이 논문에서는 기존의 범용 감성사전의 문제점을 해결하기 위해 한국어 기반의 새로운 범용 감성사전을 구축하고 이를 KNU 한국어 감성사전이라 명명한다. KNU 한국어 감성사전은 표준국어대사전의 뜻풀이의 감성을 Bi-LSTM을 활용하여 89.45%의 정확도로 분류하였으며 긍정으로 분류된 뜻풀이에서는 긍정에 대한 감성 어휘를, 부정으로 분류된 뜻풀이에서는 부정에 대한 감성 어휘를 1-gram, 2-gram, 어구 그리고 문형 등 다양한 형태로 추출한다. 또한 다양한 외부 소스(SentiWordNet, SenticNet, 감정동사, 감성사전0603)를 활용하여 감성 어휘를 확장하였으며 온라인 텍스트 데이터에서 사용되는 신조어, 이모티콘에 대한 감성 어휘도 포함하고 있다. 이 논문에서 구축한 KNU 한국어 감성사전은 특정 도메인에 영향을 받지 않는 14,843개의 감성 어휘로 구성되어 있으며 특정 도메인에 대한 감성사전을 효율적이고 빠르게 구축하기 위한 기초 자료로 활용될 수 있다. 또한 딥러닝의 성능을 높이기 위한 입력 자질로써 활용될 수 있으며, 기본적인 감성 분석의 수행이나 기계 학습을 위한 대량의 학습 데이터 세트를 빠르게 구축에 활용될 수 있다.

로버트 벤투리와 알도 로시 건축에서 도시 경관의 의미와 해석에 관한 연구 (A Study on the Meaning and interpretation of Urban Landscape in Architecture of Robert Venturi and Aldo Rossi)

  • 박형진;이종석;이상연
    • 한국실내디자인학회논문집
    • /
    • 제21권2호
    • /
    • pp.23-34
    • /
    • 2012
  • After the modern age, the rapid urbanizationhad a big impact on the then architecture. R. Venturi and A. Rossi are two of the leading architects, developing architecture in cities in the US and Europe respectively. This study shed light on a tangible and intangible meaning and interpretation of urban landscapes through their architectural thoughts and architectures. The followings are the physical and intangible meaning and interpretation in architectural thoughts and works of those two architects. Venturi understood that iconological landscapes at the roadside in large citiesare the nature of physical landscapes. To Venturi, the façades of buildings at the roadside are a part of signage such as traffic lights and road signs, and those façades have the meaning of symbolic systems beyond simple physical landscapes. To A. Rossi, types of buildings as physical townscapes are a key role supporting raw data of classification in architecture. And also, those types have significance of the basic data shedding light on the principles and history of cities. For intangible factors in R. Venturi's architecture, daily routine, function and use, time, a use for a building and others form complex architecture. And also, those factors describe shared values of the same period as the façades of buildings and complex symbols and formative lexicons in metaphorical terms. For A. Rossi's intangible factors, 'collective memory' is buried in inhabitants of the city, and with that, the city is a place for memory to its inhabitants. What is more, cities' monuments have intangible landscapes like 'sustainability', 'permanence' and so on. With lots of events happening throughoutcities, those monuments are the whole images of cities giving the value to the urban buildings that reside in cities. Finally, R. Venturi's all-encompassing complex architecture concept was extended on a tangible and intangible point of townscapes. It was found that A. Rossi's tangible thought was formed from the whole landscape of historic cities in then Italy as the background of time and place. Also, With types of urban buildings and 'collective memory', A. Rossi drew architectural norms and formats of unchangeable types.

  • PDF