• Title/Summary/Keyword: Korean nouns

Search Result 232, Processing Time 0.023 seconds

A Study on Named Entity Recognition for Effective Dialogue Information Prediction (효율적 대화 정보 예측을 위한 개체명 인식 연구)

  • Go, Myunghyun;Kim, Hakdong;Lim, Heonyeong;Lee, Yurim;Jee, Minkyu;Kim, Wonil
    • Journal of Broadcast Engineering
    • /
    • v.24 no.1
    • /
    • pp.58-66
    • /
    • 2019
  • Recognition of named entity such as proper nouns in conversation sentences is the most fundamental and important field of study for efficient conversational information prediction. The most important part of a task-oriented dialogue system is to recognize what attributes an object in a conversation has. The named entity recognition model carries out recognition of the named entity through the preprocessing, word embedding, and prediction steps for the dialogue sentence. This study aims at using user - defined dictionary in preprocessing stage and finding optimal parameters at word embedding stage for efficient dialogue information prediction. In order to test the designed object name recognition model, we selected the field of daily chemical products and constructed the named entity recognition model that can be applied in the task-oriented dialogue system in the related domain.

A Comparative Study on the Korean and English Genderlect: Focused on Polite Expressions (한국어와 영어 성별어 비교연구: 공손표현과 관련하여)

  • Kim, Hyun Hyo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.10
    • /
    • pp.6527-6533
    • /
    • 2015
  • It is generally accepted that there are differences between men and women in linguistic communication style. Genderlect is a socio-linguistic term to refer to the linguistic differences spoken by specific gender. Some linguistic features are provided as evidence to show the genderlects: pitch, lexicon, intonation, grammar and styles. The purpose of this paper is to compare the characteristics of genderlect in English and Korean. To do so, I analyzed the scripts of an English movie, 'Mrs. Doubtfire' and Korean tv drama, 'Oohlala couple'. In "Mrs. Doubtfire, tension and laughter arose out of discrepancy from the way he looked (as a woman) and the way he spoke (like a man). The same is true with "Oohlala couple." In the language of Mrs. Doubtfire, male speech characteristics with nouns were salient while in "Oohlala couple" with verb forms, especially with honorific style, which shows a difference between Korean and English genderlect. Korean language has special genderlect characteristics with honorific speech style realized in verb endings. In Korean the highest honorific speech style, 'Habsho-che' is used in official situation and men are more accustomed to it than women. When women have to use polite expressions they have to choose between the highest honorific style, 'Habsho-che' losing the female characteristics or the second highest honorific style 'Haeyo-che' keeping the female characteristics.

The Effects of Aging on Retrieval of Phonological Knowledge in Korean: The Tip-of-the-Tongue Phenomenon in Young and Older Adults (한국어 음운 정보 산출에서 노화의 영향: 청년과 노인의 설단현상)

  • Park, Jiyoon;Lee, Ko Eun;Lee, Hye-Won
    • Korean Journal of Cognitive Science
    • /
    • v.24 no.2
    • /
    • pp.111-132
    • /
    • 2013
  • Previous research has shown that aging asymmetrically affects various functions in language. It is known that older adults show deficits in language production compared to young adults, while the performance in semantic processing is similar between older and young adults. The tip-of-the-tongue (TOT) phenomenon effectively reflects failure in retrieval of phonological knowledge. Older adults report TOTs more often than young adults and the cause of this phenomenon has been explained by two frameworks: the 'blocking hypothesis' and 'transmission deficit hypothesis'. This study examines the effect of aging on the retrival of phonological knowledge by inducing TOTs in the laboratory. Two variables were manipulated: age and word category. Participants were young and older adults, and stimuli was selected from 5 categories of words. After the participants read a definition about a target word, they reported three conditions: 'know', 'don't know', 'TOT'. The results were as follows: First, the older adults reported TOTs more often than the young adults. Second, TOTs occurred more in proper nouns such as names of persons and places. Third, in the category that TOTs occurred more often, there was a bigger age difference. Fourth, older adults reported fewer alternative words during TOT than young adults. Fifth, participants tended to report the partial information during TOT in characters. These results show the age-related difficulty in the retrieval of phonological knowledge in Korean. It is explained by the transmission deficit hypothesis and the characteristics of Korean orthography and phonology.

  • PDF

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.

Preliminary Research about Semantic Relations and Linguistic Features in Middle School Students' Writings about Phase Transitions of Water in Air (대기 중 물의 상태변화에 관한 중학생의 글에서 나타나는 의미관계 및 과학 언어적 특성에 관한 예비연구)

  • Jung, Eun-Sook;Kim, Chan-Jong
    • Journal of the Korean earth science society
    • /
    • v.31 no.3
    • /
    • pp.288-299
    • /
    • 2010
  • Recently, scientific literacy means not only the acquisition of scientific knowledge but also the linguistic ability to participate in a scientific discourse community. Keeping this in mind, this study investigated middle school students' writings about phase transitions of water in air. Sixty seven students at 9th grade (age 15) students participated in this study and wrote two individual short texts. The result of text analysis can be summarized as follows: (1) students had problems with familiar scientific terms such as 'water vapor' and 'steam' as well as unfamiliar ones like 'dew point'. (2) Students described right semantic relations and at the same time wrong ones more in the idea formed from everyday experience than those from school instruction. (3) While students showed action and process centered writing in text about everyday phenomenon, they showed more preference for technical words and nouns in text about school science. This study suggest that students could develop linguistic ability of science from both spontaneous process based on experience and formal and theoretical learning; the former in forming various semantic relations, the latter in technical and abstract aspect of scientific writing.

Interpretation of the Forest Therapy Process and Effect Verification through KeyWord Analysis of Literature on Forest Therapy (산림치유 효과 검증 연구의 주요어 분석을 통한 치유 발현과정 해석)

  • Park, Kyeong-Ja;Shin, Chang-Seob;Kim, Dongsoo
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.1
    • /
    • pp.82-90
    • /
    • 2021
  • In this study, the validity of the forest therapy process, in which forest activities using forest therapy factors lead to immunity promotion and health promotion, was analyzed theoretically and qualitatively to refine and systemize the forest therapy concept. Research and analysis data were collected from the websites of institutions related to forest therapy; 33 theses and 33 original research articles from 2000 to March 2020 were searched for forest therapy key words, as well as the prize winning work of the 2016 forest therapy experience essay. A word cloud was generated by frequency of nouns and adjectives and from the key words in the web pages, theses, articles, and the forest therapy experience essay. Through interpretation of word frequency, the systemic flow of forest therapy was defined. The results suggest that the source of forest therapy's power was a positive experience of the forest and an improved attitude toward nature as well as forest therapeutic factors. The therapeutic effect is maximized through the forest healing program, leading to physical and mental resilience and resistance; consequently, health and immunity are promoted. From this study, forest therapy is proposed as "a health promotion activity for the psychological, physical, and spiritual resilience of the subjects through various environmental factors of the forest, positive experiences, and attitudes toward the forest."

A Study of Pre-trained Language Models for Korean Language Generation (한국어 자연어생성에 적합한 사전훈련 언어모델 특성 연구)

  • Song, Minchae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.309-328
    • /
    • 2022
  • This study empirically analyzed a Korean pre-trained language models (PLMs) designed for natural language generation. The performance of two PLMs - BART and GPT - at the task of abstractive text summarization was compared. To investigate how performance depends on the characteristics of the inference data, ten different document types, containing six types of informational content and creation content, were considered. It was found that BART (which can both generate and understand natural language) performed better than GPT (which can only generate). Upon more detailed examination of the effect of inference data characteristics, the performance of GPT was found to be proportional to the length of the input text. However, even for the longest documents (with optimal GPT performance), BART still out-performed GPT, suggesting that the greatest influence on downstream performance is not the size of the training data or PLMs parameters but the structural suitability of the PLMs for the applied downstream task. The performance of different PLMs was also compared through analyzing parts of speech (POS) shares. BART's performance was inversely related to the proportion of prefixes, adjectives, adverbs and verbs but positively related to that of nouns. This result emphasizes the importance of taking the inference data's characteristics into account when fine-tuning a PLMs for its intended downstream task.

Homonym Disambiguation based on Mutual Information and Sense-Tagged Compound Noun Dictionary (상호정보량과 복합명사 의미사전에 기반한 동음이의어 중의성 해소)

  • Heo, Jeong;Seo, Hee-Cheol;Jang, Myung-Gil
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.12
    • /
    • pp.1073-1089
    • /
    • 2006
  • The goal of Natural Language Processing(NLP) is to make a computer understand a natural language and to deliver the meanings of natural language to humans. Word sense Disambiguation(WSD is a very important technology to achieve the goal of NLP. In this paper, we describe a technology for automatic homonyms disambiguation using both Mutual Information(MI) and a Sense-Tagged Compound Noun Dictionary. Previous research work using word definitions in dictionary suffered from the problem of data sparseness because of the use of exact word matching. Our work overcomes this problem by using MI which is an association measure between words. To reflect language features, the rate of word-pairs with MI values, sense frequency and site of word definitions are used as weights in our system. We constructed a Sense-Tagged Compound Noun Dictionary for high frequency compound nouns and used it to resolve homonym sense disambiguation. Experimental data for testing and evaluating our system is constructed from QA(Question Answering) test data which consisted of about 200 query sentences and answer paragraphs. We performed 4 types of experiments. In case of being used only MI, the result of experiment showed a precision of 65.06%. When we used the weighted values, we achieved a precision of 85.35% and when we used the Sense-Tagged Compound Noun Dictionary, we achieved a precision of 88.82%, respectively.

An Analysis on Suitability of Words and Sentences in Mathematics Textbooks for Elementary First Grade (초등학교 1학년 수학 교과서의 어휘 및 문장 적합성 분석)

  • Chang, Hyewon;Lim, Miin
    • Journal of Educational Research in Mathematics
    • /
    • v.26 no.2
    • /
    • pp.247-267
    • /
    • 2016
  • It has been pointed out that the mathematics textbooks according to 2009 revised national curriculum cause difficulty not by mathematical knowledge but concomitantly by words and sentences for the first graders who just started learning Korean alphabets. This study focused on the suitability of words and sentences in mathematics textbooks for elementary first grade. We analyzed the degree of difficulty and familiarity in terms of words and the structure, length, and expression in terms of sentences. The results show some causes that lead the first graders to the difficulty. In more detail, we found 108 difficult words and 6 unfamiliar words for the first graders. And it is noticed that the textbooks contain 37 compound sentences, 727 complex sentences, and 38 compound-complex sentences. They also contain 237 long sentences that are composed of 9 words or more, 168 sentences that assign two activities or more, and 52 sentences that contain three nouns or adjectives or more successively. Based on these results and discussions, we suggested several implications for writing mathematics textbooks for the lower grades in elementary school.

Improvements of an English Pronunciation Dictionary Generator Using DP-based Lexicon Pre-processing and Context-dependent Grapheme-to-phoneme MLP (DP 알고리즘에 의한 발음사전 전처리와 문맥종속 자소별 MLP를 이용한 영어 발음사전 생성기의 개선)

  • 김회린;문광식;이영직;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.21-27
    • /
    • 1999
  • In this paper, we propose an improved MLP-based English pronunciation dictionary generator to apply to the variable vocabulary word recognizer. The variable vocabulary word recognizer can process any words specified in Korean word lexicon dynamically determined according to the current recognition task. To extend the ability of the system to task for English words, it is necessary to build a pronunciation dictionary generator to be able to process words not included in a predefined lexicon, such as proper nouns. In order to build the English pronunciation dictionary generator, we use context-dependent grapheme-to-phoneme multi-layer perceptron(MLP) architecture for each grapheme. To train each MLP, it is necessary to obtain grapheme-to-phoneme training data from general pronunciation dictionary. To automate the process, we use dynamic programming(DP) algorithm with some distance metrics. For training and testing the grapheme-to-phoneme MLPs, we use general English pronunciation dictionary with about 110 thousand words. With 26 MLPs each having 30 to 50 hidden nodes and the exception grapheme lexicon, we obtained the word accuracy of 72.8% for the 110 thousand words superior to rule-based method showing the word accuracy of 24.0%.

  • PDF