• Title/Summary/Keyword: Korean Language Model

Search Result 1,570, Processing Time 0.031 seconds

Language-based Classification of Words using Deep Learning (딥러닝을 이용한 언어별 단어 분류 기법)

  • Zacharia, Nyambegera Duke;Dahouda, Mwamba Kasongo;Joe, Inwhee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.411-414
    • /
    • 2021
  • One of the elements of technology that has become extremely critical within the field of education today is Deep learning. It has been especially used in the area of natural language processing, with some word-representation vectors playing a critical role. However, some of the low-resource languages, such as Swahili, which is spoken in East and Central Africa, do not fall into this category. Natural Language Processing is a field of artificial intelligence where systems and computational algorithms are built that can automatically understand, analyze, manipulate, and potentially generate human language. After coming to discover that some African languages fail to have a proper representation within language processing, even going so far as to describe them as lower resource languages because of inadequate data for NLP, we decided to study the Swahili language. As it stands currently, language modeling using neural networks requires adequate data to guarantee quality word representation, which is important for natural language processing (NLP) tasks. Most African languages have no data for such processing. The main aim of this project is to recognize and focus on the classification of words in English, Swahili, and Korean with a particular emphasis on the low-resource Swahili language. Finally, we are going to create our own dataset and reprocess the data using Python Script, formulate the syllabic alphabet, and finally develop an English, Swahili, and Korean word analogy dataset.

A Longitudinal Analysis of Factors Affecting Language Development in Infants (영아의 언어발달 영향요인에 관한 종단 분석)

  • Kim, Minseok;Hu, Yunyun;Wang, Wenhui
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.3
    • /
    • pp.457-465
    • /
    • 2019
  • The purpose of this study is to identify factors that affect the language development of infants. For the analysis, data of three years from the first year (2008) to the third year (2010) of the 'Panel Study on Korean Children (PSKC)' were constructed and panel analysis was conducted. The subjects were 2,150 infants who participated in the questionnaire, and the language development of the infants was measured using the communication scores of the K-ASQ test provided by the Korean children's panel. In addition, the factors influencing the language development of infants derived from previous studies were introduced into the model. As a result of the analysis, it is shown that the fixed effect model with fixed individual error of the panel is suitable through the Hausman test. The higher the cognitive development level of the infant, the more positive parenting behavior of the infant, respectively. The conclusions and suggestions about the characteristics of the parents and the parents affecting language development were introduced.

Enhancing Recommender Systems by Fusing Diverse Information Sources through Data Transformation and Feature Selection

  • Thi-Linh Ho;Anh-Cuong Le;Dinh-Hong Vu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.5
    • /
    • pp.1413-1432
    • /
    • 2023
  • Recommender systems aim to recommend items to users by taking into account their probable interests. This study focuses on creating a model that utilizes multiple sources of information about users and items by employing a multimodality approach. The study addresses the task of how to gather information from different sources (modalities) and transform them into a uniform format, resulting in a multi-modal feature description for users and items. This work also aims to transform and represent the features extracted from different modalities so that the information is in a compatible format for integration and contains important, useful information for the prediction model. To achieve this goal, we propose a novel multi-modal recommendation model, which involves extracting latent features of users and items from a utility matrix using matrix factorization techniques. Various transformation techniques are utilized to extract features from other sources of information such as user reviews, item descriptions, and item categories. We also proposed the use of Principal Component Analysis (PCA) and Feature Selection techniques to reduce the data dimension and extract important features as well as remove noisy features to increase the accuracy of the model. We conducted several different experimental models based on different subsets of modalities on the MovieLens and Amazon sub-category datasets. According to the experimental results, the proposed model significantly enhances the accuracy of recommendations when compared to SVD, which is acknowledged as one of the most effective models for recommender systems. Specifically, the proposed model reduces the RMSE by a range of 4.8% to 21.43% and increases the Precision by a range of 2.07% to 26.49% for the Amazon datasets. Similarly, for the MovieLens dataset, the proposed model reduces the RMSE by 45.61% and increases the Precision by 14.06%. Additionally, the experimental results on both datasets demonstrate that combining information from multiple modalities in the proposed model leads to superior outcomes compared to relying on a single type of information.

Korean Morphological Analysis Method Based on BERT-Fused Transformer Model (BERT-Fused Transformer 모델에 기반한 한국어 형태소 분석 기법)

  • Lee, Changjae;Ra, Dongyul
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.4
    • /
    • pp.169-178
    • /
    • 2022
  • Morphemes are most primitive units in a language that lose their original meaning when segmented into smaller parts. In Korean, a sentence is a sequence of eojeols (words) separated by spaces. Each eojeol comprises one or more morphemes. Korean morphological analysis (KMA) is to divide eojeols in a given Korean sentence into morpheme units. It also includes assigning appropriate part-of-speech(POS) tags to the resulting morphemes. KMA is one of the most important tasks in Korean natural language processing (NLP). Improving the performance of KMA is closely related to increasing performance of Korean NLP tasks. Recent research on KMA has begun to adopt the approach of machine translation (MT) models. MT is to convert a sequence (sentence) of units of one domain into a sequence (sentence) of units of another domain. Neural machine translation (NMT) stands for the approaches of MT that exploit neural network models. From a perspective of MT, KMA is to transform an input sequence of units belonging to the eojeol domain into a sequence of units in the morpheme domain. In this paper, we propose a deep learning model for KMA. The backbone of our model is based on the BERT-fused model which was shown to achieve high performance on NMT. The BERT-fused model utilizes Transformer, a representative model employed by NMT, and BERT which is a language representation model that has enabled a significant advance in NLP. The experimental results show that our model achieves 98.24 F1-Score.

A Study on the Korean University Students' Usage of Foreign Language Queries in Scholarly Information Retrieval (학술정보검색을 위한 국내 대학생의 외국어 탐색문 활용에 관한 연구)

  • Lee, Bo Eun;Lee, Jee Yeon
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.1
    • /
    • pp.95-116
    • /
    • 2019
  • This study focused on understanding the Korean university students' (both undergraduates and graduates) use of foreign language for scholarly information retrieval especially in different search strategies employed based on users' characteristics. A new model was developed based on Ellis's behavioral model of information seeking strategies. The research applied both quantitative and qualitative methods to analyze the data. The students used a variety of foreign language information seeking strategies at different stages of academic information retrieval based on his/her field of study or level of education. The liberal arts and social science students had more difficulty in selecting proper search terms in the foreign language than the science and technology students. This difficulty resulted in less preference for using foreign language queries by the liberal arts and social science students. The students relied more on the bibliographic and citation information in scholarly information retrieval using foreign language queries than the Korean queries. The research outcomes should provide some guidelines on how the Korean university libraries offer information literacy programs and other services based on the patrons' characteristics.

1-Pass Semi-Dynamic Network Decoding Using a Subnetwork-Based Representation for Large Vocabulary Continuous Speech Recognition (대어휘 연속음성인식을 위한 서브네트워크 기반의 1-패스 세미다이나믹 네트워크 디코딩)

  • Chung Minhwa;Ahn Dong-Hoon
    • MALSORI
    • /
    • no.50
    • /
    • pp.51-69
    • /
    • 2004
  • In this paper, we present a one-pass semi-dynamic network decoding framework that inherits both advantages of fast decoding speed from static network decoders and memory efficiency from dynamic network decoders. Our method is based on the novel language model network representation that is essentially of finite state machine (FSM). The static network derived from the language model network [1][2] is partitioned into smaller subnetworks which are static by nature or self-structured. The whole network is dynamically managed so that those subnetworks required for decoding are cached in memory. The network is near-minimized by applying the tail-sharing algorithm. Our decoder is evaluated on the 25k-word Korean broadcast news transcription task. In case of the search network itself, the network is reduced by 73.4% from the tail-sharing algorithm. Compared with the equivalent static network decoder, the semi-dynamic network decoder has increased at most 6% in decoding time while it can be flexibly adapted to the various memory configurations, giving the minimal usage of 37.6% of the complete network size.

  • PDF

A Meta-Analysis on the Effects of Activities Using Picture Books on Language Development in Young Children (그림책을 활용한 활동이 유아의 언어발달에 미치는 효과에 대한 메타분석)

  • Shim, Gyeong-Hwa;Lim, Yangmi;Park, Eun-Young
    • Korean Journal of Childcare and Education
    • /
    • v.15 no.4
    • /
    • pp.115-134
    • /
    • 2019
  • Objective: This study was aimed to analyze the effects of activities using picture books for young children's language development and to identify factors that caused differences in these effects by applying meta-analysis. Methods: We conducted a homogeneity test of effect sizes on 21 Korean studies published in academic journals from 1990 to February 2018 and calculated the effect size by applying a random effect model. Additionally, we conducted a meta-ANOVA to investigate whether the effect sizes differed by types of language development, picture book activities, and environmental variables-such as place, time, and agent. Results: The results indicated that the effect sizes of the 21 studies were heterogeneous and the total effect size was 0.90, which was significantly large according to Cohen's standard. The effect sizes also varied by types of language development, picture book activities, and environmental variables. Conclusion/Implications: To increase the effects of activities using picture books for young children's language development, this study suggested the importance of picture book activities to be integrated with other play areas, teaching methods, and other print materials for the development of literacy abilities, and the link between home and early childhood education institutions.

Extending SQL for Moving Objects Databases

  • Nam, Kwang-Woo;Lee, Jai-Ho;Kim, Min-Soo
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.138-143
    • /
    • 2002
  • This paper describes a framework for extending GIS databases to support moving object data type and query language. The rapid progress of wireless communications, positioning systems, and mobile computing devices have led location-aware applications to be essential components for commercial and industrial systems. Location-aware applications require GIS databases system to represent moving objects and to support querying on the motion properties of objects. For example, fleet management applications may require storage of information about moving vehicles. Also, advanced CRM(Customer Relationship Management) applications may require to store and query the trajectories of mobile phone users. In this trend, maintaining consistent information about the location of continuously moving objects and processing motion-specific queries is challenging problem. We formally define a data model and query language for mobile objects that includes complex evolving spatial structure, and propose core algebra to process the moving object query language. Main profit of proposed moving objects query language and algebra is that proposed model can be constructed on the top of GIS databases.

  • PDF

Developmental Trajectories for Peer Rejection in Preschool Children Based on Latent Growth Model (잠재성장모형을 적용한 유아기 또래거부의 발달궤적)

  • Shin, Yoo Lim
    • Human Ecology Research
    • /
    • v.54 no.6
    • /
    • pp.565-574
    • /
    • 2016
  • This research examined the trajectories of peer rejection in preschool children. This study also investigated the gender differences in the intercept and slope of the trajectories for peer rejection along with the influences of aggression, withdrawal and language ability on the trajectories of peer rejection. A latent growth curve model investigated peer rejection in 3 to 5 year olds. Three hundred and thirteen 3-year-old children were recruited from five preschools and 14 daycare centers. The children's language ability was measured by a Wechsler Preschool and Primary Scale of intelligence verbal test and teachers completed measurements for aggression and withdrawal. A peer nomination inventory was used to assess peer rejection. Children were asked to nominate three classmates who they do not like to play with. The research findings showed that peer rejection decreased during the preschool years. Compared with girls, boys showed higher levels of peer rejection and a slower change rate of peer rejection. Girls who were aggressive showed high levels of peer rejection and a slow change rate of peer rejection. Moreover, girls who had a high levels language showed low levels of peer rejection and a slow change rate of peer rejection. These findings imply that language ability could be a protective factor of peer rejection for girls.

Adaptive Conversion of Web Content for Mobile Terminals (이동단말을 위한 적응적 웹 문서 변환)

  • Kang, Sueng-Chun;Chung, Kwang-Sue
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.6 no.6
    • /
    • pp.635-642
    • /
    • 2000
  • In this paper, we proposed an efficient document conversion mechanism to provide a adaptive web document to mobile terminals. We also proposed a RHTML(Reduced HTML) to archive the adaptive tag reduction. Markup error correction process in the proposed adaptive document conversion mechanism converts a HTML(HyperText Markup Language) document into a XML(Extensible Markup Language) application document. This. process makes web document easy to handle with a DOM (Document Object Mode)) as the tree model and removes the hardware overhead in mobile terminals. Also, tag reduction process provides the adaptive web document with three DTD(Document Type Definition)s in the RHTML.

  • PDF