• Title/Summary/Keyword: 사전학습 모델

Search Result 653, Processing Time 0.031 seconds

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

Natural Language Processing Model for Data Visualization Interaction in Chatbot Environment (챗봇 환경에서 데이터 시각화 인터랙션을 위한 자연어처리 모델)

  • Oh, Sang Heon;Hur, Su Jin;Kim, Sung-Hee
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.11
    • /
    • pp.281-290
    • /
    • 2020
  • With the spread of smartphones, services that want to use personalized data are increasing. In particular, healthcare-related services deal with a variety of data, and data visualization techniques are used to effectively show this. As data visualization techniques are used, interactions in visualization are also naturally emphasized. In the PC environment, since the interaction for data visualization is performed with a mouse, various filtering for data is provided. On the other hand, in the case of interaction in a mobile environment, the screen size is small and it is difficult to recognize whether or not the interaction is possible, so that only limited visualization provided by the app can be provided through a button touch method. In order to overcome the limitation of interaction in such a mobile environment, we intend to enable data visualization interactions through conversations with chatbots so that users can check individual data through various visualizations. To do this, it is necessary to convert the user's query into a query and retrieve the result data through the converted query in the database that is storing data periodically. There are many studies currently being done to convert natural language into queries, but research on converting user queries into queries based on visualization has not been done yet. Therefore, in this paper, we will focus on query generation in a situation where a data visualization technique has been determined in advance. Supported interactions are filtering on task x-axis values and comparison between two groups. The test scenario utilized data on the number of steps, and filtering for the x-axis period was shown as a bar graph, and a comparison between the two groups was shown as a line graph. In order to develop a natural language processing model that can receive requested information through visualization, about 15,800 training data were collected through a survey of 1,000 people. As a result of algorithm development and performance evaluation, about 89% accuracy in classification model and 99% accuracy in query generation model was obtained.

Prediction of Key Variables Affecting NBA Playoffs Advancement: Focusing on 3 Points and Turnover Features (미국 프로농구(NBA)의 플레이오프 진출에 영향을 미치는 주요 변수 예측: 3점과 턴오버 속성을 중심으로)

  • An, Sehwan;Kim, Youngmin
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.263-286
    • /
    • 2022
  • This study acquires NBA statistical information for a total of 32 years from 1990 to 2022 using web crawling, observes variables of interest through exploratory data analysis, and generates related derived variables. Unused variables were removed through a purification process on the input data, and correlation analysis, t-test, and ANOVA were performed on the remaining variables. For the variable of interest, the difference in the mean between the groups that advanced to the playoffs and did not advance to the playoffs was tested, and then to compensate for this, the average difference between the three groups (higher/middle/lower) based on ranking was reconfirmed. Of the input data, only this year's season data was used as a test set, and 5-fold cross-validation was performed by dividing the training set and the validation set for model training. The overfitting problem was solved by comparing the cross-validation result and the final analysis result using the test set to confirm that there was no difference in the performance matrix. Because the quality level of the raw data is high and the statistical assumptions are satisfied, most of the models showed good results despite the small data set. This study not only predicts NBA game results or classifies whether or not to advance to the playoffs using machine learning, but also examines whether the variables of interest are included in the major variables with high importance by understanding the importance of input attribute. Through the visualization of SHAP value, it was possible to overcome the limitation that could not be interpreted only with the result of feature importance, and to compensate for the lack of consistency in the importance calculation in the process of entering/removing variables. It was found that a number of variables related to three points and errors classified as subjects of interest in this study were included in the major variables affecting advancing to the playoffs in the NBA. Although this study is similar in that it includes topics such as match results, playoffs, and championship predictions, which have been dealt with in the existing sports data analysis field, and comparatively analyzed several machine learning models for analysis, there is a difference in that the interest features are set in advance and statistically verified, so that it is compared with the machine learning analysis result. Also, it was differentiated from existing studies by presenting explanatory visualization results using SHAP, one of the XAI models.

A Technique to Recommend Appropriate Developers for Reported Bugs Based on Term Similarity and Bug Resolution History (개발자 별 버그 해결 유형을 고려한 자동적 개발자 추천 접근법)

  • Park, Seong Hun;Kim, Jung Il;Lee, Eun Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.12
    • /
    • pp.511-522
    • /
    • 2014
  • During the development of the software, a variety of bugs are reported. Several bug tracking systems, such as, Bugzilla, MantisBT, Trac, JIRA, are used to deal with reported bug information in many open source development projects. Bug reports in bug tracking system would be triaged to manage bugs and determine developer who is responsible for resolving the bug report. As the size of the software is increasingly growing and bug reports tend to be duplicated, bug triage becomes more and more complex and difficult. In this paper, we present an approach to assign bug reports to appropriate developers, which is a main part of bug triage task. At first, words which have been included the resolved bug reports are classified according to each developer. Second, words in newly bug reports are selected. After first and second steps, vectors whose items are the selected words are generated. At the third step, TF-IDF(Term frequency - Inverse document frequency) of the each selected words are computed, which is the weight value of each vector item. Finally, the developers are recommended based on the similarity between the developer's word vector and the vector of new bug report. We conducted an experiment on Eclipse JDT and CDT project to show the applicability of the proposed approach. We also compared the proposed approach with an existing study which is based on machine learning. The experimental results show that the proposed approach is superior to existing method.

A Theoretical Study on Abduction as an Inquiry Method in Earth Science (지구과학의 한 탐구 방법으로서 귀추법에 대한 이론적 고찰)

  • Oh, Phil-Seok;Kim, Chan-Jong
    • Journal of The Korean Association For Science Education
    • /
    • v.25 no.5
    • /
    • pp.610-623
    • /
    • 2005
  • This was a theoretical study of which the goal was to provide a foundation for developing and implementing earth science inquiry activities based on abduction as a scientific inquiry method. Through a review of relevant literature, the study examined the nature of earth science in terms of the goals of earth science inquiry and the characteristics of what is investigated in earth science. It also explored the forms and meanings of abduction, thinking strategies used in the abductive inference, and the abductive inquiry model. Abduction is the process of inferring certain rules (e.g., scientific facts, principles, laws) and providing explanatory statements or hypotheses in order to explain some phenomena. This method was found to be well-suited to the earth science inquiry which studies the causes and processes of natural phenomena in the earth and space environment. Abduction has the nature of ampliative, selective, evaluative, and creative inference, and several thinking strategies, including reconstruction of data, heuristic generalization, analogy, existential, conceptual combination, and elimination strategies, are employed for inferring rules and suggesting hypotheses. This study found the abductive inquiry model to be adaptable to earth science classrooms, and it is therefore suggested that earth science instructions should be based on the abductive method and that research work concerning the abductive inquiry in the classroom should follow.

Developing the mathematics model textbook based on storytelling with real-life context - Focusing on the coordinate geometry contents - (실생활 연계형 스토리텔링 수학 교과서 개발 -도형의 방정식 단원을 중심으로-)

  • Kim, Yujung;Kim, Ji Sun;Park, Sang Eui;Park, Kyoo-Hong;Lee, Jaesung
    • Communications of Mathematical Education
    • /
    • v.27 no.3
    • /
    • pp.179-203
    • /
    • 2013
  • The purpose of this study was to discuss the example that developed geometry model textbook based on storytelling using real-life context. To achieve this purpose, we first elaborated the meaning of the textbook based on storytelling with real-life context, and then we discussed the outline of the story and the summary of each lesson. This study defined the storytelling textbook with real-life context as the textbook consisting of activities that explored and organized mathematical concepts by using real-life situations as materials of stories. The geometry textbook we developed employed two real-life materials, a map and a set square: we used a map for the coordinate geometry and a set square for the equation of a line. To attract students' interest, we introduced confrontation between a teacher and two students and a villain. We implemented experimentation with the textbook based on storytelling in order to verify its validity. The participants were 25 students that were enrolled in a high school in Seoul. Among them, 17 participants were surveyed. Students' answers from the survey questionnaire suggested that the geometry textbook we developed based on storytelling helped them learn mathematics and that the instruments such as a map and a set square helped them understand mathematical concepts. However, their opinion implied that the story of the textbook needed to be improved so that the story reflected more realistic contexts that were familiar with students.

Development and Application of Systems Thinking-based STEAM Education Program to Improve Secondary Science Gifted and Talented Students' Systems Thinking Skill (중등 과학 영재학생들의 시스템 사고력 향상을 위한 융합인재교육 프로그램의 개발 및 적용)

  • Park, Byung-Yeol;Lee, Hyonyong
    • Journal of Gifted/Talented Education
    • /
    • v.24 no.3
    • /
    • pp.421-444
    • /
    • 2014
  • In STEAM education, contents that has been extracted from a variety of areas, so it can work closely and systematically. Therefore STEAM education requires systems thinking that can be grasped effectively these different disciplines. The purposes of this study are to develop a STEAM program based on systems thinking, and apply the program to the secondary science gifted student in order to investigate the educational effect. A model of the Program developed from previous research and theoretical contents of systems thinking and STEAM. A draft of the STEAM program was developed on the theme of "rocket". A total of 113 students was participated in this study. 100 seventh and 13 eighth graders were enrolled at seigy. A single group pre-post test paired t-test was conducted on them in systems thinking skills. Result of applying the program to the students as follows. The systems thinking ability was improved after the application of the program. 'Mental Model', 'Personal Skill', 'Team Learning', and 'System Analysis', 'Shared Vision' emerged for both improved significantly. In conclusion, the STEAM program based on system thinking improves students' systems thinking skills. This program of results can be helpful in cultivate human resources with the problem solving ability based on system thinking and STEAM literacy by used in public education curriculum.

Vehicle Headlight and Taillight Recognition in Nighttime using Low-Exposure Camera and Wavelet-based Random Forest (저노출 카메라와 웨이블릿 기반 랜덤 포레스트를 이용한 야간 자동차 전조등 및 후미등 인식)

  • Heo, Duyoung;Kim, Sang Jun;Kwak, Choong Sub;Nam, Jae-Yeal;Ko, Byoung Chul
    • Journal of Broadcast Engineering
    • /
    • v.22 no.3
    • /
    • pp.282-294
    • /
    • 2017
  • In this paper, we propose a novel intelligent headlight control (IHC) system which is durable to various road lights and camera movement caused by vehicle driving. For detecting candidate light blobs, the region of interest (ROI) is decided as front ROI (FROI) and back ROI (BROI) by considering the camera geometry based on perspective range estimation model. Then, light blobs such as headlights, taillights of vehicles, reflection light as well as the surrounding road lighting are segmented using two different adaptive thresholding. From the number of segmented blobs, taillights are first detected using the redness checking and random forest classifier based on Haar-like feature. For the headlight and taillight classification, we use the random forest instead of popular support vector machine or convolutional neural networks for supporting fast learning and testing in real-life applications. Pairing is performed by using the predefined geometric rules, such as vertical coordinate similarity and association check between blobs. The proposed algorithm was successfully applied to various driving sequences in night-time, and the results show that the performance of the proposed algorithms is better than that of recent related works.

A Study on the Development of Experiential STEAM Program Based on Visual Impairment Using 3D Printer: Focusing on 'Sun' Concept (3D프린터 활용 체험형 STEAM 프로그램 개발 연구: '태양' 개념을 중심으로)

  • Kim, Sanggul;Kim, Hyoungbum;Kim, Yonggi
    • Journal of the Korean Society of Earth Science Education
    • /
    • v.15 no.1
    • /
    • pp.62-75
    • /
    • 2022
  • In this study, experiential STEAM program using 3D printer was produced focusing on the content elements of 'solar' in the 2015 revised science curriculum, and in order to find out the effectiveness of the STEAM program, analyzed creative problem solving, STEAM attitude, and STEAM satisfaction by applying it to two middle school 77 students simple random sampled. The results of this study are as follows. First, a solar tactile model was produced using a 3D printer, and a program was developed to enable students to actively learn experience-oriented activities through visual impairment experiences. Second, in the response sample t-test by the difference in pre- and post-score of STEAM attitude tests, significant statistical test results were shown in 'interest', 'consideration', 'self-concept', 'self-efficacy', and 'science and engineering career choice' sub-factors except 'consideration' and 'usefulness / value recognition' sub-factors (p<.05). Third,, the STEAM satisfaction test conducted after the application of the 3D printer-based STEAM program showed that the average value range of sub-factors were 3.66~3.97, which improved students' understanding and interest in science subjects through the 3D printer-based STEAM program.

Developing and Implementing a Secondary Teacher Training Program to Build TPACK in Entrepreneurship Education (기업가정신 교육에서의 TPACK 강화를 위한 중등 교사 연수 프로그램 개발 및 적용)

  • Seonghye Yoon;Seyoung Kim
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.18 no.4
    • /
    • pp.51-63
    • /
    • 2023
  • The purpose of this study is to develop and implement a secondary teacher training program based on the TPACK model to strengthen the capacity of teachers of youth entrepreneurship education in the context of the increasing importance of entrepreneurship as a future competency, and to provide theoretical and practical implications based on it. To this end, a teacher training program was developed through the process of analysis, design, development, implementation, and evaluation based on the ADDIE model, and 22 secondary school teachers in Gangwon Province were trained and the effectiveness and validity were analyzed. First, the results of the paired sample t-test of TPACK in entrepreneurship education conducted before and after the program showed statistically significant improvements in all sub-competencies. Second, the satisfaction survey of the training program showed that the overall satisfaction was high with M=4.83. Third, the validity of the program was reviewed by three experts, and it was found to be highly valid with a validity of M=5.0, usefulness of M=4.7, and universality of M=5.0. Based on the results, it is suggested that in order to expand entrepreneurship education, opportunities for teachers' holistic capacity building such as TPACK should be expanded, teachers' understanding and practice of backward design should be promoted, and access to various resources that can be utilized in entrepreneurship education should be improved.

  • PDF