• Title/Summary/Keyword: 문장 유형

Search Result 238, Processing Time 0.031 seconds

Using Syntactic Unit of Morpheme for Reducing Morphological and Syntactic Ambiguity (형태소 및 구문 모호성 축소를 위한 구문단위 형태소의 이용)

  • Hwang, Yi-Gyu;Lee, Hyun-Young;Lee, Yong-Seok
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.7
    • /
    • pp.784-793
    • /
    • 2000
  • The conventional morphological analysis of Korean language presents various morphological ambiguities because of its agglutinative nature. These ambiguities cause syntactic ambiguities and they make it difficult to select the correct parse tree. This problem is mainly related to the auxiliary predicate or bound noun in Korean. They have a strong relationship with the surrounding morphemes which are mostly functional morphemes that cannot stand alone. The combined morphemes have a syntactic or semantic role in the sentence. We extracted these morphemes from 0.2 million tagged words and classified these morphemes into three types. We call these morphemes a syntactic morpheme and regard them as an input unit of the syntactic analysis. This paper presents the syntactic morpheme is an efficient method for solving the following problems: 1) reduction of morphological ambiguities, 2) elimination of unnecessary partial parse trees during the parsing, and 3) reduction of syntactic ambiguity. Finally, the experimental results show that the syntactic morpheme is an essential unit for reducing morphological and syntactic ambiguity.

  • PDF

Named Entity Recognition and Dictionary Construction for Korean Title: Books, Movies, Music and TV Programs (한국어 제목 개체명 인식 및 사전 구축: 도서, 영화, 음악, TV프로그램)

  • Park, Yongmin;Lee, Jae Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.7
    • /
    • pp.285-292
    • /
    • 2014
  • A named entity recognition method is used to improve the performance of information retrieval systems, question answering systems, machine translation systems and so on. The targets of the named entity recognition are usually PLOs (persons, locations and organizations). They are usually proper nouns or unregistered words, and traditional named entity recognizers use these characteristics to find out named entity candidates. The titles of books, movies and TV programs have different characteristics than PLO entities. They are sometimes multiple phrases, one sentence, or special characters. This makes it difficult to find the named entity candidates. In this paper we propose a method to quickly extract title named entities from news articles and automatically build a named entity dictionary for the titles. For the candidates identification, the word phrases enclosed with special symbols in a sentence are firstly extracted, and then verified by the SVM with using feature words and their distances. For the classification of the extracted title candidates, SVM is used with the mutual information of word contexts.

A Method for Detection and Correction of Pseudo-Semantic Errors Due to Typographical Errors (철자오류에 기인한 가의미 오류의 검출 및 교정 방법)

  • Kim, Dong-Joo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.10
    • /
    • pp.173-182
    • /
    • 2013
  • Typographical mistakes made in the writing process of drafts of electronic documents are more common than any other type of errors. The majority of these errors caused by mistyping are regarded as consequently still typo-errors, but a considerable number of them are developed into the grammatical errors and the semantic errors. Pseudo semantic errors among these errors due to typographical errors have more noticeable peculiarities than pure semantic errors between senses of surrounding context words within a sentence. These semantic errors can be detected and corrected by simple algorithm based on the co-occurrence frequency because of their prominent contextual discrepancy. I propose a method for detection and correction based on the co-occurrence frequency in order to detect semantic errors due to typo-errors. The co-occurrence frequency in proposed method is counted for only words with immediate dependency relation, and the cosine similarity measure is used in order to detect pseudo semantic errors. From the presented experimental results, the proposed method is expected to help improve the detecting rate of overall proofreading system by about 2~3%.

Wortschatzarbeit in der Wortbildung und ihre didaktische $Vorschl\"{a}ge$ (조어론에 있어서의 어휘연습과 교수법 제언)

  • Jang Ki-Sung;Jung Hyun-Sook
    • Koreanishche Zeitschrift fur Deutsche Sprachwissenschaft
    • /
    • v.3
    • /
    • pp.233-252
    • /
    • 2001
  • 1970년이래 외국어학습 및 교수법에 있어서 어휘에 관련된 문제들에 많은 관심과 그 중요성이 인식되고있다. 특히 Fleischer/Buz (1992)등에 의한 당해 영역의 연구물 뿐 아니라, 전문서적 및 어학 자료(교재)등에서도 이러한 중요성이 강조되어 왔음을 알 수 있다. Fleischer등은 조어규칙의 개념과 조어모델을 규정하는 근거들로 생산성 Produktivitat, 용인성 Akzeptabilitat, 조어참여성 Aktivitat등 중요한 매개요인으로 간주하고 있으며 $G\"{o}tze/ Hess-Luttich$ (1999)등의 학자들은 어휘체계에서 두 개 이상의 구성성분들이 결합하여 당해 시대의 시대정신이나 시대상에 부합되는 신조 어휘들을 생성하며, 또한 그 사회의 정보화와 기술화에 이바지하며, 이를 통해서 전문어의 생산력을 한층 높혀 주는 통로로 작용함을 주장한바 있다. 본고에서는 조어론의 이러한 기본원리나 개념들에 입각하여 독일어 수업에서 목표어의 습득에 관여적인 역할을 수행하는 조어모델, 즉 합성어와 파생어를 형용사와 명사의 층위에서 구체적으로 분석하고 기술했다. 예컨데, 합성어에 있어서 접두사와 접미사, 조어의 유형 가운데 축약어, 그리고 외래어 기저와 고유어 접미사 및 접두사, 고유어기저와 외래어접미사(접두사) 뿐만 아니라, 의미론적 관점에서 본 합성어의 형태, 합성 연결소의 형태와 기호의 사용, 명사적 파생어에서 고유어접미사(접두사), 축약조어와 축약어 단어형성, 형용사조어의 특성, 명시적파생 가운데 고유어(외래어) 접미사(접두사) 등이 어휘생성과 어휘신장의 관점에서 교수법의 적용가능성이 논의되었다. 결론부에서는 외국어를 습득하고자하는 학습자에게 일방적이고 획일적인 암기식 위주의 어휘학습방법에서 벗어나, 목표어가 요구하는 새로운 어휘를 획득하는데 비교적 용이하며 또한 체계적으로 습득 할 수 있도록 인지론에 기대어 텍스트, 문장, 어휘영역 등이 투입되어 적용되었으며, 이에 상응되게 구체적인 몇몇 방안들이 제시되었다. 학습자들이 텍스트를 읽고 중심내용을 찾아내며, 단락을 구획하고 또한 체계를 파악하는데 있어서 어휘연습은 외국어 교수법 측면에서도 매우 관여적이며 시의적절한 과제라 생각된다.

  • PDF

Extracting Alternative Word Candidates for Patent Information Search (특허 정보 검색을 위한 대체어 후보 추출 방법)

  • Baik, Jong-Bum;Kim, Seong-Min;Lee, Soo-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.4
    • /
    • pp.299-303
    • /
    • 2009
  • Patent information search is used for checking existence of earlier works. In patent information search, there are many reasons that fails to get appropriate information. This research proposes a method extracting alternative word candidates in order to minimize search failure due to keyword mismatch. Assuming that two words have similar meaning if they have similar co-occurrence words, the proposed method uses the concept of concentration, association word set, cosine similarity between association word sets and a ranking modification technique. Performance of the proposed method is evaluated using a manually extracted alternative word candidate list. Evaluation results show that the proposed method outperforms the document vector space model in recall.

English Conversation System Using Artificial Intelligent of based on Virtual Reality (가상현실 기반의 인공지능 영어회화 시스템)

  • Cheon, EunYoung
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.11
    • /
    • pp.55-61
    • /
    • 2019
  • In order to realize foreign language education, various existing educational media have been provided, but there are disadvantages in that the cost of the parish and the media program is high and the real-time responsiveness is poor. In this paper, we propose an artificial intelligence English conversation system based on VR and speech recognition. We used Google CardBoard VR and Google Speech API to build the system and developed artificial intelligence algorithms for providing virtual reality environment and talking. In the proposed speech recognition server system, the sentences spoken by the user can be divided into word units and compared with the data words stored in the database to provide the highest probability. Users can communicate with and respond to people in virtual reality. The function provided by the conversation is independent of the contextual conversations and themes, and the conversations with the AI assistant are implemented in real time so that the user system can be checked in real time. It is expected to contribute to the expansion of virtual education contents service related to the Fourth Industrial Revolution through the system combining the virtual reality and the voice recognition function proposed in this paper.

A Study on Phenomenon 'Play of Words' in Modern Russian Advertising Language (현대 러시아 광고언어에 있어서의 '언어유희' 현상에 대한 연구)

  • Kim, Sung Wan
    • Cross-Cultural Studies
    • /
    • v.42
    • /
    • pp.241-260
    • /
    • 2016
  • The purpose of this article is to represent the types of advertising in the modern Russian language as 'Play of Words' (игра слов). The causal reason for this phenomenon is studied from the result of certain characteristics of advertising. The definition and characteristics of the language of the advertisement are analyzed in achieving the goal, as these factors reveal how language is used to maximize the effectiveness of the advertising. Academic research is needed in the collaborative fields of linguistics, psychology, economics, sociology, marketing, literature, art, and music. Modern advertisement is mixed with semiotic objects that consist of display, sound, and texts. While this study is not complete, the acknowledgement of the phenomenon 'Play of Words' between the creators of advertising and the consumer is undeniable. On one hand, advertising is recognized by linguists as the main factor that destroys the literary language. It represents the distortion of a standard language norm, as opposed to formal linguistic means used in advertising. In this research, we pay attention to the frequent use of foreign language borrowings and incorrect representation of foreign words, slang and jargon, that occur in misspelled usage of literary norms. The features that are revealed in this article are helpful to understand the purpose of advertising.

Understanding the Categories and Characteristics of Depressive Moods in Chatbot Data (챗봇 데이터에 나타난 우울 담론의 범주와 특성의 이해)

  • Chin, HyoJin;Jung, Chani;Baek, Gumhee;Cha, Chiyoung;Choi, Jeonghoi;Cha, Meeyoung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.9
    • /
    • pp.381-390
    • /
    • 2022
  • Influenced by a culture that prefers non-face-to-face activity during the COVID-19 pandemic, chatbot usage is accelerating. Chatbots have been used for various purposes, not only for customer service in businesses and social conversations for fun but also for mental health. Chatbots are a platform where users can easily talk about their depressed moods because anonymity is guaranteed. However, most relevant research has been on social media data, especially Twitter data, and few studies have analyzed the commercially used chatbots data. In this study, we identified the characteristics of depressive discourse in user-chatbot interaction data by analyzing the chats, including the word 'depress,' using the topic modeling algorithm and the text-mining technique. Moreover, we compared its characteristics with those of the depressive moods in the Twitter data. Finally, we draw several design guidelines and suggest avenues for future research based on the study findings.

Re-defining Named Entity Type for Personal Information De-identification and A Generation method of Training Data (개인정보 비식별화를 위한 개체명 유형 재정의와 학습데이터 생성 방법)

  • Choi, Jae-hoon;Cho, Sang-hyun;Kim, Min-ho;Kwon, Hyuk-chul
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.206-208
    • /
    • 2022
  • As the big data industry has recently developed significantly, interest in privacy violations caused by personal information leakage has increased. There have been attempts to automate this through named entity recognition in natural language processing. In this paper, named entity recognition data is constructed semi-automatically by identifying sentences with de-identification information from de-identification information in Korean Wikipedia. This can reduce the cost of learning about information that is not subject to de-identification compared to using general named entity recognition data. In addition, it has the advantage of minimizing additional systems based on rules and statistics to classify de-identification information in the output. The named entity recognition data proposed in this paper is classified into twelve categories. There are included de-identification information, such as medical records and family relationships. In the experiment using the generated dataset, KoELECTRA showed performance of 0.87796 and RoBERTa of 0.88.

  • PDF

Research on Utilization of AI in the Media Industry: Focusing on Social Consensus of Pros and Cons in the Journalism Sector (미디어 산업 AI 활용성에 관한 고찰 : 저널리즘 분야 적용의 주요 쟁점을 중심으로)

  • Jeonghyeon Han;Hajin Yoo;Minjun Kang;Hanjin Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.713-722
    • /
    • 2024
  • This study highlights the impact of Artificial Intelligence (AI) technology on journalism, discussing its utility and addressing major ethical concerns. Broadcasting companies and media institutions, such as the Bloomberg, Guardian, WSJ, WP, NYT, globally are utilizing AI for innovation in news production, data analysis, and content generation. Accordingly, the ecosystem of AI journalism will be analyzed in terms of scale, economic feasibility, diversity, and value enhancement of major media AI service types. Through the previous literature review, this study identifies key ethical and social issues in AI journalism as well. It aims to bridge societal and technological concerns by exploring mutual development directions for AI technology and the media industry. Additionally, it advocates for the necessity of integrated guidelines and advanced AI literacy through social consensus in addressing these issues.