• Title/Summary/Keyword: Text Model learning

Search Result 427, Processing Time 0.027 seconds

A Classification Model for Illegal Debt Collection Using Rule and Machine Learning Based Methods

  • Kim, Tae-Ho;Lim, Jong-In
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.93-103
    • /
    • 2021
  • Despite the efforts of financial authorities in conducting the direct management and supervision of collection agents and bond-collecting guideline, the illegal and unfair collection of debts still exist. To effectively prevent such illegal and unfair debt collection activities, we need a method for strengthening the monitoring of illegal collection activities even with little manpower using technologies such as unstructured data machine learning. In this study, we propose a classification model for illegal debt collection that combine machine learning such as Support Vector Machine (SVM) with a rule-based technique that obtains the collection transcript of loan companies and converts them into text data to identify illegal activities. Moreover, the study also compares how accurate identification was made in accordance with the machine learning algorithm. The study shows that a case of using the combination of the rule-based illegal rules and machine learning for classification has higher accuracy than the classification model of the previous study that applied only machine learning. This study is the first attempt to classify illegalities by combining rule-based illegal detection rules with machine learning. If further research will be conducted to improve the model's completeness, it will greatly contribute in preventing consumer damage from illegal debt collection activities.

Development of a Discussion-Centered Teaching and Learning Model (토의 중심 교수학습 모형 개발)

  • Yoon Ok Han
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.1-11
    • /
    • 2023
  • The purpose of this study is to develop a discussion-centered teaching and learning model for nurturing creative and convergence talents. Regarding the research method, a draft model on discussion-centered teaching and learning was devised, and the model was completed through expert validation. The final draft was revised and supplemented by verifying how valid the model was when applied in class by using the derived final draft. Compared with the draft on discussion-centered teaching and learning model, the final model focused on text-reading emphasis, methods of questioning, and question generation strategies, excluding jigsaw discussions. The discussion-centered teaching and learning model developed in this study is expected to help instructors foster creative and convergence talents. Three suggestions have been provided to effectively apply this model to the field. First, an attitude of listening and respect is required during a discussion. Second, a plan should be considered on how to induce active participation of learners participating in the discussion. Third, the importance of managing discussion time was emphasized.

Comparison of Korean Real-time Text-to-Speech Technology Based on Deep Learning (딥러닝 기반 한국어 실시간 TTS 기술 비교)

  • Kwon, Chul Hong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.1
    • /
    • pp.640-645
    • /
    • 2021
  • The deep learning based end-to-end TTS system consists of Text2Mel module that generates spectrogram from text, and vocoder module that synthesizes speech signals from spectrogram. Recently, by applying deep learning technology to the TTS system the intelligibility and naturalness of the synthesized speech is as improved as human vocalization. However, it has the disadvantage that the inference speed for synthesizing speech is very slow compared to the conventional method. The inference speed can be improved by applying the non-autoregressive method which can generate speech samples in parallel independent of previously generated samples. In this paper, we introduce FastSpeech, FastSpeech 2, and FastPitch as Text2Mel technology, and Parallel WaveGAN, Multi-band MelGAN, and WaveGlow as vocoder technology applying non-autoregressive method. And we implement them to verify whether it can be processed in real time. Experimental results show that by the obtained RTF all the presented methods are sufficiently capable of real-time processing. And it can be seen that the size of the learned model is about tens to hundreds of megabytes except WaveGlow, and it can be applied to the embedded environment where the memory is limited.

A Study of Research on Methods of Automated Biomedical Document Classification using Topic Modeling and Deep Learning (토픽모델링과 딥 러닝을 활용한 생의학 문헌 자동 분류 기법 연구)

  • Yuk, JeeHee;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.2
    • /
    • pp.63-88
    • /
    • 2018
  • This research evaluated differences of classification performance for feature selection methods using LDA topic model and Doc2Vec which is based on word embedding using deep learning, feature corpus sizes and classification algorithms. In addition to find the feature corpus with high performance of classification, an experiment was conducted using feature corpus was composed differently according to the location of the document and by adjusting the size of the feature corpus. Conclusionally, in the experiments using deep learning evaluate training frequency and specifically considered information for context inference. This study constructed biomedical document dataset, Disease-35083 which consisted biomedical scholarly documents provided by PMC and categorized by the disease category. Throughout the study this research verifies which type and size of feature corpus produces the highest performance and, also suggests some feature corpus which carry an extensibility to specific feature by displaying efficiency during the training time. Additionally, this research compares the differences between deep learning and existing method and suggests an appropriate method by classification environment.

A Study on the Improvement of Tesseract-based OCR Model Recognition Rate using Ontology (온톨로지를 이용한 tesseract 기반의 OCR 모델 인식률 향상에 관한 연구)

  • Hwang, Chi-gon;Yun, Dai Yeol;Yoon, Chang-Pyo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.438-440
    • /
    • 2021
  • With the development of machine learning, artificial intelligence techniques are being applied in various fields. Among these fields, there is an OCR technique that converts characters in images into text. The tesseract developed by HP is one of those techniques. However, the recognition rate for recognizing characters in images is still low. To this end, we try to improve the conversion rate of the text of the image through the post-processing process that recognizes the context using the ontology.

  • PDF

Korean Spatial Information Extraction using Bi-LSTM-CRF Ensemble Model (Bi-LSTM-CRF 앙상블 모델을 이용한 한국어 공간 정보 추출)

  • Min, Tae Hong;Shin, Hyeong Jin;Lee, Jae Sung
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.11
    • /
    • pp.278-287
    • /
    • 2019
  • Spatial information extraction is to retrieve static and dynamic aspects in natural language text by explicitly marking spatial elements and their relational words. This paper proposes a deep learning approach for spatial information extraction for Korean language using a two-step bidirectional LSTM-CRF ensemble model. The integrated model of spatial element extraction and spatial relation attribute extraction is proposed too. An experiment with the Korean SpaceBank demonstrates the better efficiency of the proposed deep learning model than that of the previous CRF model, also showing that the proposed ensemble model performed better than the single model.

CNN-based Skip-Gram Method for Improving Classification Accuracy of Chinese Text

  • Xu, Wenhua;Huang, Hao;Zhang, Jie;Gu, Hao;Yang, Jie;Gui, Guan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.12
    • /
    • pp.6080-6096
    • /
    • 2019
  • Text classification is one of the fundamental techniques in natural language processing. Numerous studies are based on text classification, such as news subject classification, question answering system classification, and movie review classification. Traditional text classification methods are used to extract features and then classify them. However, traditional methods are too complex to operate, and their accuracy is not sufficiently high. Recently, convolutional neural network (CNN) based one-hot method has been proposed in text classification to solve this problem. In this paper, we propose an improved method using CNN based skip-gram method for Chinese text classification and it conducts in Sogou news corpus. Experimental results indicate that CNN with the skip-gram model performs more efficiently than CNN-based one-hot method.

Topic Analysis of the National Petition Site and Prediction of Answerable Petitions Based on Deep Learning (국민청원 주제 분석 및 딥러닝 기반 답변 가능 청원 예측)

  • Woo, Yun Hui;Kim, Hyon Hee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.2
    • /
    • pp.45-52
    • /
    • 2020
  • Since the opening of the national petition site, it has attracted much attention. In this paper, we perform topic analysis of the national petition site and propose a prediction model for answerable petitions based on deep learning. First, 1,500 petitions are collected, topics are extracted based on the petitions' contents. Main subjects are defined using K-means clustering algorithm, and detailed subjects are defined using topic modeling of petitions belonging to the main subjects. Also, long short-term memory (LSTM) is used for prediction of answerable petitions. Not only title and contents but also categories, length of text, and ratio of part of speech such as noun, adjective, adverb, verb are also used for the proposed model. Our experimental results show that the type 2 model using other features such as ratio of part of speech, length of text, and categories outperforms the type 1 model without other features.

Character-based Subtitle Generation by Learning of Multimodal Concept Hierarchy from Cartoon Videos (멀티모달 개념계층모델을 이용한 만화비디오 컨텐츠 학습을 통한 등장인물 기반 비디오 자막 생성)

  • Kim, Kyung-Min;Ha, Jung-Woo;Lee, Beom-Jin;Zhang, Byoung-Tak
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.451-458
    • /
    • 2015
  • Previous multimodal learning methods focus on problem-solving aspects, such as image and video search and tagging, rather than on knowledge acquisition via content modeling. In this paper, we propose the Multimodal Concept Hierarchy (MuCH), which is a content modeling method that uses a cartoon video dataset and a character-based subtitle generation method from the learned model. The MuCH model has a multimodal hypernetwork layer, in which the patterns of the words and image patches are represented, and a concept layer, in which each concept variable is represented by a probability distribution of the words and the image patches. The model can learn the characteristics of the characters as concepts from the video subtitles and scene images by using a Bayesian learning method and can also generate character-based subtitles from the learned model if text queries are provided. As an experiment, the MuCH model learned concepts from 'Pororo' cartoon videos with a total of 268 minutes in length and generated character-based subtitles. Finally, we compare the results with those of other multimodal learning models. The Experimental results indicate that given the same text query, our model generates more accurate and more character-specific subtitles than other models.

Exploring the Core Keywords of the Secondary School Home Economics Teacher Selection Test: A Mixed Method of Content and Text Network Analyses (중등학교 가정과교사 임용시험의 핵심 키워드 탐색: 내용 분석과 텍스트 네트워크 분석을 중심으로)

  • Mi Jeong, Park;Ju, Han
    • Human Ecology Research
    • /
    • v.60 no.4
    • /
    • pp.625-643
    • /
    • 2022
  • The purpose of this study was to explore the trends and core keywords of the secondary school home economics teacher selection test using content analysis and text network analysis. The sample comprised texts of the secondary school home economics teacher 1st selection test for the 2017-2022 school years. Determination of frequency of occurrence, generation of word clouds, centrality analysis, and topic modeling were performed using NetMiner 4.4. The key results were as follows. First, content analysis revealed that the number of questions and scores for each subject (field) has remained constant since 2020, unlike before 2020. In terms of subjects, most questions focused on 'theory of home economics education', and among the evaluation content elements, the highest percentage of questions asked was for 'home economics teaching·learning methods and practice'. Second, the network of the secondary school home economics teacher selection test covering the 2017-2022 school years has an extremely weak density. For the 2017-2019 school years, 'learning', 'evaluation', 'instruction', and 'method' appeared as important keywords, and 7 topics were extracted. For the 2020-2022 school years, 'evaluation', 'class', 'learning', 'cycle', and 'model' were influential keywords, and five topics were extracted. This study is meaningful in that it attempted a new research method combining content analysis and text network analysis and prepared basic data for the revision of the evaluation area and evaluation content elements of the secondary school home economics teacher selection test.