• 제목/요약/키워드: Language Resources

검색결과 424건 처리시간 0.023초

웹 기반의 언어자원 객체화에 근거한 사전 개발 시스템 (A Dictionary Constructing System based on a Web-based Object Model of Distributed Language Resources)

  • 황도삼
    • 인지과학
    • /
    • 제12권1_2호
    • /
    • pp.1-9
    • /
    • 2001
  • 본 논문에서는 각기다른 장소에 다양한 형태로 분산되어 있는 여러 가지 언어자원들을 웹 기반에서 객체화시키는 모델을 제안한다. 웹 기반에서 객체화된 언어자원들은 다양한 응용 시스템 개발에 간단한 방법으로 이용되어 강력한 자연언어처리 응용 시스템을 구성할 수 있다. 또한, 초기 개발 이후에 이루어진 각 언어자원들의 개량은 별도의 처리과정 없이 자동으로 각 응용 시스템에 반영되므로 효과적인 유지보수가 가능하다는 장점이 있다. 제안한 모델의 적합성을 검증하기 위해 사전 개발 시스템 YDK2000를 설계하고 구현하였다. 개발한 YDK 2000은 기존의 각종 사전의 여러 가지 사전정보를 통합할 수 있을 뿐 아니라 여러 자연언어처리 시스템들과의 인터넷 접속을 통해 언어처리를 위한 사전정보를 손쉽게 통합할 수 있어 고품질의 사전을 개발할 수 있다.

  • PDF

결혼이주여성의 자원체계와 한국생활적응 (International Marriage Immigrant Women's Resources for Life Adjustment in Korea)

  • 홍성희
    • 가족자원경영과 정책
    • /
    • 제17권2호
    • /
    • pp.121-145
    • /
    • 2013
  • The purpose of this study is to understand married female immigrants' life adjustment process in Korea by explaining the resources to which they have access and how they use them. The data were collected through in-depth interviews with ten female participants who have more than one child, have participated in programs of the multicultural family support center, have work experience, can communicate with Koreans, and live in Daegu. The major findings are as follows. The participants' personal resources differed. English language skills were very useful resources for making money and for earning the respect of family members and others. However, the participants without English language skills had sincerely and actively tried to learn the Korean language and gain bilingual competence. The participants obtained diverse family resources from their husbands and parents-in-law after adapting themselves to perform their gender role. Further, the participants used the social resources offered by public support systems as a starting point for learning the Korean language in their early adaptation process, and formed personal networks with staff members at the multicultural family support center. The results show that the participants used many kinds of resources for acculturation by interacting positively with their environment. Moreover, the resources from diverse levels of their environments affected their acculturation process.

  • PDF

Automatic Mapping Between Large-Scale Heterogeneous Language Resources for NLP Applications: A Case of Sejong Semantic Classes and KorLexNoun for Korean

  • Park, Heum;Yoon, Ae-Sun
    • 한국언어정보학회지:언어와정보
    • /
    • 제15권2호
    • /
    • pp.23-45
    • /
    • 2011
  • This paper proposes a statistical-based linguistic methodology for automatic mapping between large-scale heterogeneous languages resources for NLP applications in general. As a particular case, it treats automatic mapping between two large-scale heterogeneous Korean language resources: Sejong Semantic Classes (SJSC) in the Sejong Electronic Dictionary (SJD) and nouns in KorLex. KorLex is a large-scale Korean WordNet, but it lacks syntactic information. SJD contains refined semantic-syntactic information, with semantic labels depending on SJSC, but the list of its entry words is much smaller than that of KorLex. The goal of our study is to build a rich language resource by integrating useful information within SJD into KorLex. In this paper, we use both linguistic and statistical methods for constructing an automatic mapping methodology. The linguistic aspect of the methodology focuses on the following three linguistic clues: monosemy/polysemy of word forms, instances (example words), and semantically related words. The statistical aspect of the methodology uses the three statistical formulae ${\chi}^2$, Mutual Information and Information Gain to obtain candidate synsets. Compared with the performance of manual mapping, the automatic mapping based on our proposed statistical linguistic methods shows good performance rates in terms of correctness, specifically giving recall 0.838, precision 0.718, and F1 0.774.

  • PDF

Development of Sensibility Vocabulary Classification System for Sensibility Evaluation of Visitors According to Forest Environment

  • Lee, Jeong-Do;Joung, Dawou;Hong, Sung-Jun;Kim, Da-Young;Park, Bum-Jin
    • 인간식물환경학회지
    • /
    • 제22권2호
    • /
    • pp.209-217
    • /
    • 2019
  • Generally human sensibility is expressed in a certain language. To discover the sensibility of visitors in relation to the forest environment, it is first necessary to determine their exact meanings. Furthermore, it is necessary to sort these terms according to their meanings based on an appropriate classification system. This study attempted to develop a classification system for forest sensibility vocabulary by extracting Korean words used by forest visitors to express their sensibilities in relation to the forest environment, and established the structure of the system to classify the accumulated vocabulary. For this purpose, we extracted forest sensibility words based on literature review of experiences reported in the past as well as interviews of forest visitors, and categorized the words by meanings using the Standard Korean Language Dictionary maintained by the National Institute of the Korean Language. Next, the classification system for these words was established with reference to the classification system for vocabulary in the Korean language examined in previous studies of Korean language and literature. As a result, 137 forest sensibility words were collected using a documentary survey, and we categorized these words into four types: emotion, sense, evaluation, and existence. Categorizing the collected forest sensibility words based on this Korean language classification system resulted in the extraction of 40 representative sensibility words. This experiment enabled us to determine from where our sensibilities that find expressions in the forest are derived, that is, from sight, hearing, smell, taste, or touch, along with various other aspects of how our human sensibilities are expressed such as whether the subject of a word is person-centered or object-centered. We believe that the results of this study can serve as foundational data about forest sensibility.

언어 자원과 토픽 모델의 순차 매칭을 이용한 유사 문장 계산 기반의 위키피디아 한국어-영어 병렬 말뭉치 구축 (Building a Korean-English Parallel Corpus by Measuring Sentence Similarities Using Sequential Matching of Language Resources and Topic Modeling)

  • 천주룡;고영중
    • 정보과학회 논문지
    • /
    • 제42권7호
    • /
    • pp.901-909
    • /
    • 2015
  • 본 논문은 위키피디아로부터 한국어-영어 간 병렬 말뭉치를 구축하기 위한 연구이다. 이를 위해, 언어 자원과 토픽모델의 순차 매칭 기반의 유사 문장 계산 방법을 제안한다. 먼저, 언어자원의 매칭은 위키피디아 제목으로 구성된 위키 사전, 숫자, 다음 온라인 사전을 단어 매칭에 순차적으로 적용하였다. 또한, 위키피디아의 특성을 활용하기 위해 위키 사전에서 추정한 번역 확률을 단어 매칭에 추가 적용하였다. 그리고 토픽모델로부터 추출한 단어 분포를 유사도 계산에 적용함으로써 정확도를 향상시켰다. 실험에서, 선행연구의 언어자원만을 선형 결합한 유사 문장 계산은 F1-score 48.4%, 언어자원과 모든 단어 분포를 고려한 토픽모델의 결합은 51.6%의 성능을 보였으나, 본 논문에서 제안한 언어자원에 번역 확률을 추가하여 순차 매칭을 적용한 방법은 58.3%로 9.9%의 성능 향상을 얻었고, 여기에 중요한 단어 분포를 고려한 토픽모델을 적용한 방법이 59.1%로 7.5%의 성능 향상을 얻었다.

Automatic Acquisition of Lexical-Functional Grammar Resources from a Japanese Dependency Corpus

  • Oya, Masanori;Genabith, Josef Van
    • 한국언어정보학회:학술대회논문집
    • /
    • 한국언어정보학회 2007년도 정기학술대회
    • /
    • pp.375-384
    • /
    • 2007
  • This paper describes a method for automatic acquisition of wide-coverage treebank-based deep linguistic resources for Japanese, as part of a project on treebank-based induction of multilingual resources in the framework of Lexical-Functional Grammar (LFG). We automatically annotate LFG f-structure functional equations (i.e. labelled dependencies) to the Kyoto Text Corpus version 4.0 (KTC4) (Kurohashi and Nagao 1997) and the output of of Kurohashi-Nagao Parser (KNP) (Kurohashi and Nagao 1998), a dependency parser for Japanese. The original KTC4 and KNP provide unlabelled dependencies. Our method also includes zero pronoun identification. The performance of the f-structure annotation algorithm with zero-pronoun identification for KTC4 is evaluated against a manually-corrected Gold Standard of 500 sentences randomly chosen from KTC4 and results in a pred-only dependency f-score of 94.72%. The parsing experiments on KNP output yield a pred-only dependency f-score of 82.08%.

  • PDF

Collaborative Social Tagging for eBook using External DSL Approach

  • 유환수;김성환
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2014년도 추계학술발표대회
    • /
    • pp.1068-1072
    • /
    • 2014
  • We propose a collaborative social tagging for eBook using external DSL approach. The goal of this paper is (1) to provide DSL by which authors can write HTML5 rich contents ebook and tag resources, (2) to make users enhance book by tagging resources easily, (3) to make readers read rich book easily regardless of their devices types, (4) to provide ebook resources of RESTful address style by which other system can identify self-descriptive resources of book. To achieve the goal, we provide Bukle DSL language by which author and users can author and enhance ebook with ease. As a domainspecific language Bukle provides a simple yet expressive language for authoring and tagging books that would otherwise be more difficult to express with a general purpose language. Further work includes visual DSL approach and tools by using that the unskilled users could tag book easily. In order that future work also includes text-to-visual DSL transform engine. UX research is also required to tag and to author book. To tackle the above questions we are looking at using visual notation focusing visual syntax.

세종계획 언어자원 기반 한국어 명사은행 (Korean Nominal Bank, Using Language Resources of Sejong Project)

  • 김동성
    • 한국언어정보학회지:언어와정보
    • /
    • 제17권2호
    • /
    • pp.67-91
    • /
    • 2013
  • This paper describes Korean Nominal Bank, a project that provides argument structure for instances of the predicative nouns in the Sejong parsed Corpus. We use the language resources of the Sejong project, so that the same set of data is annotated with more and more levels of annotation, since a new type of a language resource building project could bring new information of separate and isolated processing. We have based on the annotation scheme based on the Sejong electronic dictionary, semantically tagged corpus, and syntactically analyzed corpus. Our work also involves the deep linguistic knowledge of syntaxsemantic interface in general. We consider the semantic theories including the Frame Semantics of Fillmore (1976), argument structure of Grimshaw (1990) and argument alternation of Levin (1993), and Levin and Rappaport Hovav (2005). Various syntactic theories should be needed in explaining various sentence types, including empty categories, raising, left (or right dislocation). We also need an explanation on the idiosyncratic lexical feature, such as collocation and etc.

  • PDF

드라마 「신조협려(神雕俠侶)」를 활용한 대학 중국어 교육 (Teaching Chinese through Drama to University Students for Language Skills)

  • 최태훈
    • 비교문화연구
    • /
    • 제31권
    • /
    • pp.415-438
    • /
    • 2013
  • This paper explores how to teach Chinese, using multi-media resources such as Chinese dramas and focusing on one of Jin Yong's dramas, The Return of the Condor Heroes. The purpose of this study is to develop teaching methodologies for university students learning Chinese through drama to integrate language skills: enhancing communicative competence and understanding Chinese cultures. First, the overview of previous studies provides several cases of foreign language education using drama. Teaching Chinese through drama can be an integrative education because students can develop their communicative competence as well as understand the cultures of the target language. In other words, the contexts of drama may offer rich sources of the history of China, Han Chinese ethnocentrism, and knowledge of Chinese literature as well as geography. Second, this study applies the principles of Tomlinson (2010) for materials development in language teaching into the case of Chinese drama. It concentrates on Jin Yong's The Return of the Condor Heroes that the author has used in the Chinese language courses for three years. It examines the characteristics of the drama for developing effective ways of teaching and learning Chinese language and culture. Furthermore, it discusses the impact of using drama on changes in students' pervasive perceptions about unnecessity of Chinese classical literature. Third, this paper presents some sample lessons which may help teachers to develop understanding of how to organize lessons through drama. Finally, it illustrates university students' opinions about using drama to learn Chinese.

Construction of Text Summarization Corpus in Economics Domain and Baseline Models

  • Sawittree Jumpathong;Akkharawoot Takhom;Prachya Boonkwan;Vipas Sutantayawalee;Peerachet Porkaew;Sitthaa Phaholphinyo;Charun Phrombut;Khemarath Choke-mangmi;Saran Yamasathien;Nattachai Tretasayuth;Kasidis Kanwatchara;Atiwat Aiemleuk;Thepchai Supnithi
    • Journal of information and communication convergence engineering
    • /
    • 제22권1호
    • /
    • pp.33-43
    • /
    • 2024
  • Automated text summarization (ATS) systems rely on language resources as datasets. However, creating these datasets is a complex and labor-intensive task requiring linguists to extensively annotate the data. Consequently, certain public datasets for ATS, particularly in languages such as Thai, are not as readily available as those for the more popular languages. The primary objective of the ATS approach is to condense large volumes of text into shorter summaries, thereby reducing the time required to extract information from extensive textual data. Owing to the challenges involved in preparing language resources, publicly accessible datasets for Thai ATS are relatively scarce compared to those for widely used languages. The goal is to produce concise summaries and accelerate the information extraction process using vast amounts of textual input. This study introduced ThEconSum, an ATS architecture specifically designed for Thai language, using economy-related data. An evaluation of this research revealed the significant remaining tasks and limitations of the Thai language.