• Title/Summary/Keyword: Word learning system

Search Result 202, Processing Time 0.02 seconds

Acoustic analysis of Korean trisyllabic words produced by English and Korean speakers

  • Lee, Jeong-Hwa;Rhee, Seok-Chae
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.1-6
    • /
    • 2018
  • The current study aimed to investigate the transfer of English word stress rules to the production of Korean trisyllabic words by L1 English learners of Korean. It compared English and Korean speakers' productions of seven Korean words from the corpus L2KSC (Rhee et al., 2005). To this end, it analyzed the syllable duration, intensity, and pitch. The results showed that English and Korean speakers' pronunciations differed markedly in duration and intensity. English learners produced word-initial syllables of greater intensity than Korean speakers, while Korean speakers produced word-final syllables of longer duration than English learners. However, these differences between the two speaker groups were not related to the expected L1 transfer. The tonal patterns produced by English and Korean speakers were similar, reflecting L1 English speakers' learning of the L2 Korean prosodic system.

Key-word Recognition System using Signification Analysis and Morphological Analysis (의미 분석과 형태소 분석을 이용한 핵심어 인식 시스템)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of Korea Multimedia Society
    • /
    • v.13 no.11
    • /
    • pp.1586-1593
    • /
    • 2010
  • Vocabulary recognition error correction method has probabilistic pattern matting and dynamic pattern matting. In it's a sentences to based on key-word by semantic analysis. Therefore it has problem with key-word not semantic analysis for morphological changes shape. Recognition rate improve of vocabulary unrecognized reduced this paper is propose. In syllable restoration algorithm find out semantic of a phoneme recognized by a phoneme semantic analysis process. Using to sentences restoration that morphological analysis and morphological analysis. Find out error correction rate using phoneme likelihood and confidence for system parse. When vocabulary recognition perform error correction for error proved vocabulary. system performance comparison as a result of recognition improve represent 2.0% by method using error pattern learning and error pattern matting, vocabulary mean pattern base on method.

A Movie Recommendation System based on Fuzzy-AHP and Word2vec (Fuzzy-AHP와 Word2Vec 학습 기법을 이용한 영화 추천 시스템)

  • Oh, Jae-Taek;Lee, Sang-Yong
    • Journal of Digital Convergence
    • /
    • v.18 no.1
    • /
    • pp.301-307
    • /
    • 2020
  • In recent years, a recommendation system is introduced in many different fields with the beginning of the 5G era and making a considerably prominent appearance mainly in books, movies, and music. In such a recommendation system, however, the preference degrees of users are subjective and uncertain, which means that it is difficult to provide accurate recommendation service. There should be huge amounts of learning data and more accurate estimation technologies in order to improve the performance of a recommendation system. Trying to solve this problem, this study proposed a movie recommendation system based on Fuzzy-AHP and Word2vec. The proposed system used Fuzzy-AHP to make objective predictions about user preference and Word2vec to classify scraped data. The performance of the system was assessed by measuring the accuracy of Word2vec outcomes based on grid search and comparing movie ratings predicted by the system with those by the audience. The results show that the optimal accuracy of cross validation was 91.4%, which means excellent performance. The differences in move ratings between the system and the audience were compared with the Fuzzy-AHP system, and it was superior at approximately 10%.

Research Trends Analysis of Machine Learning and Deep Learning: Focused on the Topic Modeling (머신러닝 및 딥러닝 연구동향 분석: 토픽모델링을 중심으로)

  • Kim, Chang-Sik;Kim, Namgyu;Kwahk, Kee-Young
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.15 no.2
    • /
    • pp.19-28
    • /
    • 2019
  • The purpose of this study is to examine the trends on machine learning and deep learning research in the published journals from the Web of Science Database. To achieve the study purpose, we used the abstracts of 20,664 articles published between 1990 and 2017, which include the word 'machine learning', 'deep learning', and 'artificial neural network' in their titles. Twenty major research topics were identified from topic modeling analysis and they were inclusive of classification accuracy, machine learning, optimization problem, time series model, temperature flow, engine variable, neuron layer, spectrum sample, image feature, strength property, extreme machine learning, control system, energy power, cancer patient, descriptor compound, fault diagnosis, soil map, concentration removal, protein gene, and job problem. The analysis of the time-series linear regression showed that all identified topics in machine learning research were 'hot' ones.

Zero-anaphora resolution in Korean based on deep language representation model: BERT

  • Kim, Youngtae;Ra, Dongyul;Lim, Soojong
    • ETRI Journal
    • /
    • v.43 no.2
    • /
    • pp.299-312
    • /
    • 2021
  • It is necessary to achieve high performance in the task of zero anaphora resolution (ZAR) for completely understanding the texts in Korean, Japanese, Chinese, and various other languages. Deep-learning-based models are being employed for building ZAR systems, owing to the success of deep learning in the recent years. However, the objective of building a high-quality ZAR system is far from being achieved even using these models. To enhance the current ZAR techniques, we fine-tuned a pretrained bidirectional encoder representations from transformers (BERT). Notably, BERT is a general language representation model that enables systems to utilize deep bidirectional contextual information in a natural language text. It extensively exploits the attention mechanism based upon the sequence-transduction model Transformer. In our model, classification is simultaneously performed for all the words in the input word sequence to decide whether each word can be an antecedent. We seek end-to-end learning by disallowing any use of hand-crafted or dependency-parsing features. Experimental results show that compared with other models, our approach can significantly improve the performance of ZAR.

Brainstorming using TextRank algorithms and Artificial Intelligence (TextRank 알고리즘 및 인공지능을 활용한 브레인스토밍)

  • Sang-Yeong Lee;Chang-Min Yoo;Gi-Beom Hong;Jun-Hyuk Oh;Il-young Moon
    • Journal of Practical Engineering Education
    • /
    • v.15 no.2
    • /
    • pp.509-517
    • /
    • 2023
  • The reactive web service provides a related word recommendation system using the TextRank algorithm and a word-based idea generation service selected by the user. In the related word recommendation system, the method of weighting each word using the TextRank algorithm and the probability output method using SoftMax are discussed. The idea generation service discusses the idea generation method and the artificial intelligence reinforce-learning method using mini-GPT. The reactive web discusses the linkage process between React, Spring Boot, and Flask, and describes the overall operation method. When the user enters the desired topic, it provides the associated word. The user constructs a mind map by selecting a related word or adding a desired word. When a user selects a word to combine from a constructed mind-map, it provides newly generated ideas and related patents. This web service can share generated ideas with other users, and improves artificial intelligence by receiving user feedback as a horoscope.

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

A Machine-Learning Based Approach for Extracting Logical Structure of a Styled Document

  • Kim, Tae-young;Kim, Suntae;Choi, Sangchul;Kim, Jeong-Ah;Choi, Jae-Young;Ko, Jong-Won;Lee, Jee-Huong;Cho, Youngwha
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.2
    • /
    • pp.1043-1056
    • /
    • 2017
  • A styled document is a document that contains diverse decorating functions such as different font, colors, tables and images generally authored in a word processor (e.g., MS-WORD, Open Office). Compared to a plain-text document, a styled document enables a human to easily recognize a logical structure such as section, subsection and contents of a document. However, it is difficult for a computer to recognize the structure if a writer does not explicitly specify a type of an element by using the styling functions of a word processor. It is one of the obstacles to enhance document version management systems because they currently manage the document with a file as a unit, not the document elements as a management unit. This paper proposes a machine learning based approach to analyzing the logical structure of a styled document composing of sections, subsections and contents. We first suggest a feature vector for characterizing document elements from a styled document, composing of eight features such as font size, indentation and period, each of which is a frequently discovered item in a styled document. Then, we trained machine learning classifiers such as Random Forest and Support Vector Machine using the suggested feature vector. The trained classifiers are used to automatically identify logical structure of a styled document. Our experiment obtained 92.78% of precision and 94.02% of recall for analyzing the logical structure of 50 styled documents.

Fake News Detection Using Deep Learning

  • Lee, Dong-Ho;Kim, Yu-Ri;Kim, Hyeong-Jun;Park, Seung-Myun;Yang, Yu-Jun
    • Journal of Information Processing Systems
    • /
    • v.15 no.5
    • /
    • pp.1119-1130
    • /
    • 2019
  • With the wide spread of Social Network Services (SNS), fake news-which is a way of disguising false information as legitimate media-has become a big social issue. This paper proposes a deep learning architecture for detecting fake news that is written in Korean. Previous works proposed appropriate fake news detection models for English, but Korean has two issues that cannot apply existing models: Korean can be expressed in shorter sentences than English even with the same meaning; therefore, it is difficult to operate a deep neural network because of the feature scarcity for deep learning. Difficulty in semantic analysis due to morpheme ambiguity. We worked to resolve these issues by implementing a system using various convolutional neural network-based deep learning architectures and "Fasttext" which is a word-embedding model learned by syllable unit. After training and testing its implementation, we could achieve meaningful accuracy for classification of the body and context discrepancies, but the accuracy was low for classification of the headline and body discrepancies.

Design and Implementation of Mathematics Learning Evaluation System based on the Web (웹 기반 수학 학습 평가 시스템의 설계 및 구현)

  • Kim, Nam-Hee;Seo, Hae-Young;Park, Ki-Hong
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.6
    • /
    • pp.161-168
    • /
    • 2007
  • In this paper, we proposed the mathematics learning evaluation system between teachers and students using the web. The proposed web-based evaluation system lets learners make up their lesson in a self-oriented and effective way, by letting instructors diagnose learners level of understanding learned contents and letting learners take part in evaluation as well. The system also lets instructors easily make out items for evaluation by using hangul(word processor) and present them on the web. With the help of this web-based mathematics learning site and mathematics learning evaluation system, learners can perform self-oriented loaming and approach various kinds of problems. In addition, students can check with answers and have feedback on the spot. As a result, students can solve lack of understanding on the learned contents.