• Title/Summary/Keyword: 기초 어휘

Search Result 107, Processing Time 0.026 seconds

Korean Semantic Role Labeling Based on Suffix Structure Analysis and Machine Learning (접사 구조 분석과 기계 학습에 기반한 한국어 의미 역 결정)

  • Seok, Miran;Kim, Yu-Seop
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.555-562
    • /
    • 2016
  • Semantic Role Labeling (SRL) is to determine the semantic relation of a predicate and its argu-ments in a sentence. But Korean semantic role labeling has faced on difficulty due to its different language structure compared to English, which makes it very hard to use appropriate approaches developed so far. That means that methods proposed so far could not show a satisfied perfor-mance, compared to English and Chinese. To complement these problems, we focus on suffix information analysis, such as josa (case suffix) and eomi (verbal ending) analysis. Korean lan-guage is one of the agglutinative languages, such as Japanese, which have well defined suffix structure in their words. The agglutinative languages could have free word order due to its de-veloped suffix structure. Also arguments with a single morpheme are then labeled with statistics. In addition, machine learning algorithms such as Support Vector Machine (SVM) and Condi-tional Random Fields (CRF) are used to model SRL problem on arguments that are not labeled at the suffix analysis phase. The proposed method is intended to reduce the range of argument instances to which machine learning approaches should be applied, resulting in uncertain and inaccurate role labeling. In experiments, we use 15,224 arguments and we are able to obtain approximately 83.24% f1-score, increased about 4.85% points compared to the state-of-the-art Korean SRL research.

A Study on the Considerations in Rules for Authorized Access points of Music Work (음악 저작의 전거형접근점 규칙 마련시 고려사항에 관한 연구)

  • Lee, Mihwa
    • Journal of Korean Library and Information Science Society
    • /
    • v.49 no.4
    • /
    • pp.147-166
    • /
    • 2018
  • This study is to suggest the considerations in the rules for authorized access points for collocation of music work by figuring out the directions of authorized access points in FRBR, LRM, ICP 2016, RDA, and BIBFRAME, and by analyzing RDA rules for attributes and authorized access points of music works and expression and VIAF examples. First, an aggregated authorized access points were suggested as the direction of authorized access points, and original title may be selected as preferred title and the authorized access point may be based on forms in one of the languages suited to the users, if the original title is not normally suited. Second, music works's authorized access points is consisted of composer authorized access point and preferred title, and of adapter's authorized access point and preferred title in case of lacks of responsibility in composer. Also, the authorized access point of Korean traditional music work must be reviewed according to work types considering the responsibility of composer. Third, the controlled vocabularies for name of music type, medium of performance, and key could be considered for describing the attributes of work and expression. This study would be the foundation study for the authorized access point of music work, and additional research should be completed through surveying music user's need.

The narrative inquiry on Korean Language Learners' Korean proficiency and Academic adjustment in College Life (학문 목적 한국어 학습자의 한국어 능력과 학업 적응에 관한 연구)

  • Cheong Yeun Sook
    • Journal of the International Relations & Interdisciplinary Education
    • /
    • v.4 no.1
    • /
    • pp.57-83
    • /
    • 2024
  • This study aimed to investigate the impact of scores on the Test of Proficiency in Korean (TOPIK) among foreign exchange students on academic adaptation. Recruited students, approved by the Institutional Review Board (IRB), totaled seven, and their interview contents were analyzed using a comprehensive analysis procedure based on pragmatic eclecticism (Lee, Kim, 2014), utilizing six stages. As a result, factors influencing academic adaptation of Korean language learners for academic purposes were categorized into three dimensions: academic, daily life, and psychological-emotional aspects. On the academic front, interviewees pointed out difficulties in adapting to specialized terminology and studying in their majors, as well as experiencing significant challenges with Chinese characters and Sino-Korean words. Next, from a daily life perspective, even participants holding advanced TOPIK scores faced difficulties in adapting to university life, emphasizing the necessity of practical expressions and extensive vocabulary for proper adjustment to Korean life. Lastly, within the psychological-emotional dimension, despite being advanced TOPIK holders, they were found to experience considerable stress in conversations or presentations with Koreans. Their lack of knowledge in social-cultural and everyday life culture also led to linguistic errors and contributed to psychological-emotional difficulties, despite proficiency in Korean. Based on these narratives, the conclusion was reached that in order to promote the academic adaptation of Korean language learners, it is essential to provide opportunities for Korean language learning. With this goal in mind, efforts should be directed towards enhancing learners' academic proficiency in their majors, improving Korean language fluency, and fostering interpersonal relationships within the academic community. Furthermore, the researchers suggested as a solution to implement various extracurricular activities tailored for foreign learners.

A study on Korean tourism trends using social big data -Focusing on sentiment analysis- (소셜 빅데이터를 활용한 한국관광 트렌드에 관한연구 -감성분석을 중심으로-)

  • Youn-hee Choi;Kyoung-mi Yoo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.97-109
    • /
    • 2024
  • In the field of domestic tourism, tourism trend analysis of tourism consumers, both international tourists and domestic tourists, is essential not only for the Korean tourism market but also for local and governmental tourism policy makers. e will explore the keywords and sentiment analysis on social media to establish a marketing strategy plan and revitalize the domestic tourism industry through communication and information from tourism consumers. This study utilized TEXTOM 6.0 to analyze recent trends in Korean tourism. Data was collected from September 31, 2022, to August 31, 2023, using 'Korean tourism' and 'domestic tourism' as keywords, targeting blogs, cafes, and news provided by Naver, Daum, and Google. Through text mining, 100 key words and TF-IDF were extracted in order of frequency, and then CONCOR analysis and sentiment analysis were conducted. For Korean tourism keywords, words related to tourist destinations, travel companions and behaviors, tourism motivations and experiences, accommodation types, tourist information, and emotional connections ranked high. The results of the CONCOR analysis were categorized into five clusters related to tourist destinations, tourist information, tourist activities/experiences, tourism motivation/content, and inbound related. Finally, the sentiment analysis showed a high level of positive documents and vocabulary. This study analyzes the rapidly changing trends of Korean tourism through text mining on Korean tourism and is expected to provide meaningful data to promote domestic tourism not only for Koreans but also for foreigners visiting Korea.

A Processing of Progressive Aspect "te-iru" in Japanese-Korean Machine Translation (일한기계번역에서 진행형 "ている"의 번역처리)

  • Kim, Jeong-In;Mun, Gyeong-Hui;Lee, Jong-Hyeok
    • The KIPS Transactions:PartB
    • /
    • v.8B no.6
    • /
    • pp.685-692
    • /
    • 2001
  • This paper describes how to disambiguate the aspectual meaning of Japanese expression "-te iru" in Japanese-Korean machine translation Due to grammatical similarities of both languages, almost all Japanese- Korean MT systems have been developed under the direct MT strategy, in which the lexical disambiguation is essential to high-quality translation. Japanese has a progressive aspectual marker “-te iru" which is difficult to translate into Korean equivalents because in Korean there are two different progressive aspectual markers: "-ko issta" for "action progressive" and "-e issta" for "state progressive". Moreover, the aspectual system of both languages does not quite coincide with each other, so the Korean progressive aspect could not be determined by Japanese meaning of " te iru" alone. The progressive aspectural meaning may be parially determined by the meaning of predicates and also the semantic meaning of predicates may be partially reshicted by adverbials, so all Japanese predicates are classified into five classes : the 1nd verb is used only for "action progrssive",2nd verb generally for "action progressive" but occasionally for "state progressive", the 3rd verb only for "state progressive", the 4th verb generally for "state progressive", but occasIonally for "action progressive", and the 5th verb for the others. Some heuristic rules are defined for disambiguation of the 2nd and 4th verbs on the basis of adverbs and abverbial phrases. In an experimental evaluation using more than 15,000 sentances from "Asahi newspapers", the proposed method improved the translation quality by about 5%, which proves that it is effective in disambiguating "-te iru" for Japanese-Korean machine translation.translation quality by about 5%, which proves that it is effective in disambiguating "-te iru" for Japanese-Korean machine translation.anslation.

  • PDF

An Analysis of the 20th National Congress Report through Text-mining Methods (텍스트 마이닝을 활용한 중국공산당 20차 당대회 보고문 분석)

  • Kwon, Dokyung;Kim, Jungsoo;Park, Jihyun
    • Analyses & Alternatives
    • /
    • v.7 no.1
    • /
    • pp.115-145
    • /
    • 2023
  • The 20th National Congress of the Chinese Communist Party (hereafter referred to as "the 20th National Congress") was under the global spotlight long before it was held for seven days from 16 to 22 October 2022. People wondered whether Xi Jinping would secure a third term as China's leader or whether he would lay the foundations to be in power forever during the third term. In Korea, the press and media questioned whether the event would become the "crowning of Emperor Xi (Xi Huangdi)," whose power rivaled that of the first emperor in China, Shi Hunagdi, and featured the scene where Hu Jintao was forced to leave the venue during the Congress. On the other hand, many Korean academics focused more on how Xi would organize the Politburo and its Standing Committee and whether the outline of his heirs would appear during the event. This tendency in academia in turn worsened the media's concerns. This paper presents a quantitative analysis of the 20th National Congress Report, as opposed to an analysis of Xi's political intentions at the event. The National Congress Report outlines the Party's visions, goals, and strategies for the next five years in politics, economy, society, culture, foreign affairs, and relationship with Taiwan. The authoritative document is rich in narrative and logic and deserves academic study. This research analyzes the 18th, 19th, and 20th Reports by identifying their keywords and regular expressions and checking their frequency and percentage through text-mining methods. This approach enables the quantification and visualization of the significant changes in the Party's sovereign vision over the fifteen years of Xi's rule from 2013 to 2027.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.