• Title/Summary/Keyword: 키워드 학습

Search Result 269, Processing Time 0.023 seconds

A Study on Spam Document Classification Method using Characteristics of Keyword Repetition (단어 반복 특징을 이용한 스팸 문서 분류 방법에 관한 연구)

  • Lee, Seong-Jin;Baik, Jong-Bum;Han, Chung-Seok;Lee, Soo-Won
    • The KIPS Transactions:PartB
    • /
    • v.18B no.5
    • /
    • pp.315-324
    • /
    • 2011
  • In Web environment, a flood of spam causes serious social problems such as personal information leak, monetary loss from fishing and distribution of harmful contents. Moreover, types and techniques of spam distribution which must be controlled are varying as days go by. The learning based spam classification method using Bag-of-Words model is the most widely used method until now. However, this method is vulnerable to anti-spam avoidance techniques, which recent spams commonly have, because it classifies spam documents utilizing only keyword occurrence information from classification model training process. In this paper, we propose a spam document detection method using a characteristic of repeating words occurring in spam documents as a solution of anti-spam avoidance techniques. Recently, most spam documents have a trend of repeating key phrases that are designed to spread, and this trend can be used as a measure in classifying spam documents. In this paper, we define six variables, which represent a characteristic of word repetition, and use those variables as a feature set for constructing a classification model. The effectiveness of proposed method is evaluated by an experiment with blog posts and E-mail data. The result of experiment shows that the proposed method outperforms other approaches.

Designing a Conceptual Model of Knowledge Creation Type e-PBL Support System - Focused on Naval e-PBL Support System - (지식창출형 e-PBL 지원시스템의 개념적 모형 구안 - 해군 e-PBL지원시스템을 중심으로 -)

  • Park, Soo-Hong;Hong, Jin-Yong;Woo, Cha-Seop;Kim, Du-Gyu
    • Journal of The Korean Association of Information Education
    • /
    • v.12 no.4
    • /
    • pp.437-448
    • /
    • 2008
  • As the importance of knowledge is emphasized and the environment of battlefields is changing, the military also demands competent people equipped with creativity, cooperativeness and communication ability, and in this situation it is required to apply PBL to education in the navy. The present study went through three stages in order to develop a prototype to implement a naval e PBL support system for knowledge creation. First, databases in Korea Education and Research Information Service, National Assembly Library, etc. were searched using keywords such as PBL, e-PBL, knowledge creation and knowledge ecosystem. In addition, we selected and analyzed frequently quoted literature and recent research reports related to this study among domestic and foreign theses, books, research papers, etc. recommended by specialists in contents, and derived the key values of a knowledge creation type e-PBL support system and design strategies. Second, we developed a primary prototype based on the contents of analysis and, revising it according to teaching design specialists' opinions, we proposed the final prototype of knowledge creation type naval e PBL support system and it has values as follows. First, the knowledge creation type naval e PBL support system provides learners with opportunities to apply e PBL and helps them improve their creativity, cooperativeness and communication ability and accumulate know how of services. Second, it improves work efficiency by circulating knowledge through sharing among individuals or groups, and produces synergy that promotes the organizational culture of learning. Third, the knowledge creation type naval e-PBL support system enables teachers who apply PBL to school education to find new applications of PBL in constructing knowledge bases.

  • PDF

Recognition Method of Korean Abnormal Language for Spam Mail Filtering (스팸메일 필터링을 위한 한글 변칙어 인식 방법)

  • Ahn, Hee-Kook;Han, Uk-Pyo;Shin, Seung-Ho;Yang, Dong-Il;Roh, Hee-Young
    • Journal of Advanced Navigation Technology
    • /
    • v.15 no.2
    • /
    • pp.287-297
    • /
    • 2011
  • As electronic mails are being widely used for facility and speedness of information communication, as the amount of spam mails which have malice and advertisement increase and cause lots of social and economic problem. A number of approaches have been proposed to alleviate the impact of spam. These approaches can be categorized into pre-acceptance and post-acceptance methods. Post-acceptance methods include bayesian filters, collaborative filtering and e-mail prioritization which are based on words or sentances. But, spammers are changing those characteristics and sending to avoid filtering system. In the case of Korean, the abnormal usages can be much more than other languages because syllable is composed of chosung, jungsung, and jongsung. Existing formal expressions and learning algorithms have the limits to meet with those changes promptly and efficiently. So, we present an methods for recognizing Korean abnormal language(Koral) to improve accuracy and efficiency of filtering system. The method is based on syllabic than word and Smith-waterman algorithm. Through the experiment on filter keyword and e-mail extracted from mail server, we confirmed that Koral is recognized exactly according to similarity level. The required time and space costs are within the permitted limit.

Design and Implementation of Lesson Plan System for teacher-student based on XML (XML 기반 교수-학생 학습지도 시스템의 설계 및 구현)

  • Choi, Mun-Kyoung;Kim, Haeng-Kon
    • The KIPS Transactions:PartD
    • /
    • v.9D no.6
    • /
    • pp.1055-1062
    • /
    • 2002
  • Recently, the lesson plan document that is imported in the educational area is not provided to the educational information systematically, and the teachers are not easy to compose the lessen plan documentation. So, it needs additional time and effort to develope the lesson plan documents. Because of increasing the distributing network. web-based lesson plan system is required to all of the education area. Therefore, we need to compose the lesson plan that is possible to obtain the various teacher's requirement by providing creation, retrival, and reusability of document using the standard XML on web. In this paper, we developed the system for creating the common DTD (Document Type Definition), providing the standard XML document through the common DTD over the lesson plan analysis. In this system, it provides the editor to compose the lesson plan and supports the searching function to improvement of reusability on the existing lesson plan. We design the searching functions such as the structure base, facet and keyword. The composed lesson plans are interoperated with Database. Consequently, we can share the information on web by composing the lesson plan using the XML and save the time and cost by directly writing the lesson plan on web. We can also provide the improved learning environment.

Research on Text Classification of Research Reports using Korea National Science and Technology Standards Classification Codes (국가 과학기술 표준분류 체계 기반 연구보고서 문서의 자동 분류 연구)

  • Choi, Jong-Yun;Hahn, Hyuk;Jung, Yuchul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.1
    • /
    • pp.169-177
    • /
    • 2020
  • In South Korea, the results of R&D in science and technology are submitted to the National Science and Technology Information Service (NTIS) in reports that have Korea national science and technology standard classification codes (K-NSCC). However, considering there are more than 2000 sub-categories, it is non-trivial to choose correct classification codes without a clear understanding of the K-NSCC. In addition, there are few cases of automatic document classification research based on the K-NSCC, and there are no training data in the public domain. To the best of our knowledge, this study is the first attempt to build a highly performing K-NSCC classification system based on NTIS report meta-information from the last five years (2013-2017). To this end, about 210 mid-level categories were selected, and we conducted preprocessing considering the characteristics of research report metadata. More specifically, we propose a convolutional neural network (CNN) technique using only task names and keywords, which are the most influential fields. The proposed model is compared with several machine learning methods (e.g., the linear support vector classifier, CNN, gated recurrent unit, etc.) that show good performance in text classification, and that have a performance advantage of 1% to 7% based on a top-three F1 score.

Analysis of Research Trends in Korean English Education Journals Using Topic Modeling (토픽 모델링을 활용한 한국 영어교육 학술지에 나타난 연구동향 분석)

  • Won, Yongkook;Kim, Youngwoo
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.4
    • /
    • pp.50-59
    • /
    • 2021
  • To understand the research trends of English education in Korea for the last 20 years from 2000 to 2019, 12 major academic journals in Korea in the field of English education were selected, and bibliographic information of 7,329 articles published in these journals were collected and analyzed. The total number of articles increased from the 2000s to the first half of the 2010s, but decreased somewhat in the late 2010s and the number of publications by journal has become similar. These results show that the overall influence of English education journals has decreased and then leveled in terms of quantity. Next, 34 topics were extracted by applying latent Dirichlet allocation (LDA) topic modeling using the English abstract of the articles. Teacher, word, culture/media, and grammar appeared as topics that were highly studied. Topics such as word, vocabulary, and testing and evaluation appeared through unique keywords, and various topics related to learner factors emerged, becoming topics of interest in English education research. Then, topics were analyzed to determine which ones were rising or falling in frequency. As a result of this analysis, qualitative research, vocabulary, learner factor, and testing were found to be rising topics, while falling topics included CALL, language, teaching, and grammar. This change in research topics shows that research interests in the field of English education are shifting from static research topics to data-driven and dynamic research topics.

An Educational Plan for Chinese Culture through 「Analysis of the Legend of the Gaotang(高唐)shennu(神女)」 (<고당신녀전설 분석>을 통한 중국문화 교육 방안)

  • Kim, Sung-Hee;Choi, Eunsun;Park, Namje
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.1
    • /
    • pp.313-320
    • /
    • 2022
  • Recently, the keyword 'convergence' has emerged in the education field. The voice of demand for the humanities is also increasing. The range of convergence of the humanities is gradually spreading to various fields such as science, technology, engineering, and the arts field. And also, the trend is to nurture the future creative convergence talent with logical, comprehensive, and creative thinking through the fusion of humanities, scientific, and empirical theories. Myths and legends contain the content of humanity's culture creation and deal with matters such as religion, philosophy, art, and science. Therefore, through the consciousness of the ancients who lived in the so-called convergence era when academic differentiation did not occur, it will be possible to reflect on the appearance of sages. In this paper, we propose a method for educating Chinese culture through the analysis of by Wen Yi-Duo, a famous Chinese scholar. He sought to find the origin of Chinese culture through myths and legends and to find national identity by restoring the concept of national culture in the period of origin. The myths and legends of China are closely related to the cultural phenomena of modern China, which will further enhance our understanding of China.

Implementation of CNN-based Classification Training Model for Unstructured Fashion Image Retrieval using Preprocessing with MASK R-CNN (비정형 패션 이미지 검색을 위한 MASK R-CNN 선형처리 기반 CNN 분류 학습모델 구현)

  • Seunga, Cho;Hayoung, Lee;Hyelim, Jang;Kyuri, Kim;Hyeon-Ji, Lee;Bong-Ki, Son;Jaeho, Lee
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.27 no.6
    • /
    • pp.13-23
    • /
    • 2022
  • In this paper, we propose a detailed component image classification algorithm by fashion item for unstructured data retrieval in the fashion field. Due to the COVID-19 environment, AI-based online shopping malls are increasing recently. However, there is a limit to accurate unstructured data search with existing keyword search and personalized style recommendations based on user surfing behavior. In this study, pre-processing using Mask R-CNN was conducted using images crawled from online shopping sites and then classified components for each fashion item through CNN. We obtain the accuaracy for collar of the shirt's as 93.28%, the pattern of the shirt as 98.10%, the 3 classese fit of the jeans as 91.73%, And, we further obtained one for the 4 classes fit of jeans as 81.59% and the color of the jeans as 93.91%. At the results for the decorated items, we also obtained the accuract of the washing of the jeans as 91.20% and the demage of jeans accuaracy as 92.96%.

The way to make training data for deep learning model to recognize keywords in product catalog image at E-commerce (온라인 쇼핑몰에서 상품 설명 이미지 내의 키워드 인식을 위한 딥러닝 훈련 데이터 자동 생성 방안)

  • Kim, Kitae;Oh, Wonseok;Lim, Geunwon;Cha, Eunwoo;Shin, Minyoung;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.1-23
    • /
    • 2018
  • From the 21st century, various high-quality services have come up with the growth of the internet or 'Information and Communication Technologies'. Especially, the scale of E-commerce industry in which Amazon and E-bay are standing out is exploding in a large way. As E-commerce grows, Customers could get what they want to buy easily while comparing various products because more products have been registered at online shopping malls. However, a problem has arisen with the growth of E-commerce. As too many products have been registered, it has become difficult for customers to search what they really need in the flood of products. When customers search for desired products with a generalized keyword, too many products have come out as a result. On the contrary, few products have been searched if customers type in details of products because concrete product-attributes have been registered rarely. In this situation, recognizing texts in images automatically with a machine can be a solution. Because bulk of product details are written in catalogs as image format, most of product information are not searched with text inputs in the current text-based searching system. It means if information in images can be converted to text format, customers can search products with product-details, which make them shop more conveniently. There are various existing OCR(Optical Character Recognition) programs which can recognize texts in images. But existing OCR programs are hard to be applied to catalog because they have problems in recognizing texts in certain circumstances, like texts are not big enough or fonts are not consistent. Therefore, this research suggests the way to recognize keywords in catalog with the Deep Learning algorithm which is state of the art in image-recognition area from 2010s. Single Shot Multibox Detector(SSD), which is a credited model for object-detection performance, can be used with structures re-designed to take into account the difference of text from object. But there is an issue that SSD model needs a lot of labeled-train data to be trained, because of the characteristic of deep learning algorithms, that it should be trained by supervised-learning. To collect data, we can try labelling location and classification information to texts in catalog manually. But if data are collected manually, many problems would come up. Some keywords would be missed because human can make mistakes while labelling train data. And it becomes too time-consuming to collect train data considering the scale of data needed or costly if a lot of workers are hired to shorten the time. Furthermore, if some specific keywords are needed to be trained, searching images that have the words would be difficult, as well. To solve the data issue, this research developed a program which create train data automatically. This program can make images which have various keywords and pictures like catalog and save location-information of keywords at the same time. With this program, not only data can be collected efficiently, but also the performance of SSD model becomes better. The SSD model recorded 81.99% of recognition rate with 20,000 data created by the program. Moreover, this research had an efficiency test of SSD model according to data differences to analyze what feature of data exert influence upon the performance of recognizing texts in images. As a result, it is figured out that the number of labeled keywords, the addition of overlapped keyword label, the existence of keywords that is not labeled, the spaces among keywords and the differences of background images are related to the performance of SSD model. This test can lead performance improvement of SSD model or other text-recognizing machine based on deep learning algorithm with high-quality data. SSD model which is re-designed to recognize texts in images and the program developed for creating train data are expected to contribute to improvement of searching system in E-commerce. Suppliers can put less time to register keywords for products and customers can search products with product-details which is written on the catalog.

Development of evaluation factors for SW education in elementary and secondary schools (초·중등 SW교육의 평가요소 개발)

  • Park, Juyeon;Kim, Jonghye;kim, Sughee;Lee, HyunSook;Kim, Soohwan
    • The Journal of Korean Association of Computer Education
    • /
    • v.20 no.6
    • /
    • pp.47-59
    • /
    • 2017
  • The Goal of SW education is to cultivate creative and convergent human resources with computational thinking ability. The content and methods of SW education are diverse, and it is difficult for the students to properly evaluate what they have learned. In order to appropriately evaluate the learning contents, SW education evaluations should be able to easily evaluate the core content of the SW education. The purpose of this study is to provide a systematic framework that can be used to develop evaluation factors to evaluate the effectiveness of SW education in elementary and secondary schools. We conducted a literature review, a field suitability review through FGI, an expert consultation, and a Delphi survey. As a result, the metrics of the cognitive domain were developed with 17 keywords in three categories: Computational Materials & Outputs (CMO), Computational Concepts (CC), and Computational Practices (CP). Also, metrics of the affective domain were developed with 13 sub-areas in four categories: value, attitude, computational thinking efficacy, and interest. The SW education evaluation factors developed in this study can be used as a framework to develop the evaluation contents in accordance with the contents of education.