• Title/Summary/Keyword: 학습 문서

Search Result 848, Processing Time 0.025 seconds

The Classification System and Information Service for Establishing a National Collaborative R&D Strategy in Infectious Diseases: Focusing on the Classification Model for Overseas Coronavirus R&D Projects (국가 감염병 공동R&D전략 수립을 위한 분류체계 및 정보서비스에 대한 연구: 해외 코로나바이러스 R&D과제의 분류모델을 중심으로)

  • Lee, Doyeon;Lee, Jae-Seong;Jun, Seung-pyo;Kim, Keun-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.127-147
    • /
    • 2020
  • The world is suffering from numerous human and economic losses due to the novel coronavirus infection (COVID-19). The Korean government established a strategy to overcome the national infectious disease crisis through research and development. It is difficult to find distinctive features and changes in a specific R&D field when using the existing technical classification or science and technology standard classification. Recently, a few studies have been conducted to establish a classification system to provide information about the investment research areas of infectious diseases in Korea through a comparative analysis of Korea government-funded research projects. However, these studies did not provide the necessary information for establishing cooperative research strategies among countries in the infectious diseases, which is required as an execution plan to achieve the goals of national health security and fostering new growth industries. Therefore, it is inevitable to study information services based on the classification system and classification model for establishing a national collaborative R&D strategy. Seven classification - Diagnosis_biomarker, Drug_discovery, Epidemiology, Evaluation_validation, Mechanism_signaling pathway, Prediction, and Vaccine_therapeutic antibody - systems were derived through reviewing infectious diseases-related national-funded research projects of South Korea. A classification system model was trained by combining Scopus data with a bidirectional RNN model. The classification performance of the final model secured robustness with an accuracy of over 90%. In order to conduct the empirical study, an infectious disease classification system was applied to the coronavirus-related research and development projects of major countries such as the STAR Metrics (National Institutes of Health) and NSF (National Science Foundation) of the United States(US), the CORDIS (Community Research & Development Information Service)of the European Union(EU), and the KAKEN (Database of Grants-in-Aid for Scientific Research) of Japan. It can be seen that the research and development trends of infectious diseases (coronavirus) in major countries are mostly concentrated in the prediction that deals with predicting success for clinical trials at the new drug development stage or predicting toxicity that causes side effects. The intriguing result is that for all of these nations, the portion of national investment in the vaccine_therapeutic antibody, which is recognized as an area of research and development aimed at the development of vaccines and treatments, was also very small (5.1%). It indirectly explained the reason of the poor development of vaccines and treatments. Based on the result of examining the investment status of coronavirus-related research projects through comparative analysis by country, it was found that the US and Japan are relatively evenly investing in all infectious diseases-related research areas, while Europe has relatively large investments in specific research areas such as diagnosis_biomarker. Moreover, the information on major coronavirus-related research organizations in major countries was provided by the classification system, thereby allowing establishing an international collaborative R&D projects.

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

An Efficient Estimation of Place Brand Image Power Based on Text Mining Technology (텍스트마이닝 기반의 효율적인 장소 브랜드 이미지 강도 측정 방법)

  • Choi, Sukjae;Jeon, Jongshik;Subrata, Biswas;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.113-129
    • /
    • 2015
  • Location branding is a very important income making activity, by giving special meanings to a specific location while producing identity and communal value which are based around the understanding of a place's location branding concept methodology. Many other areas, such as marketing, architecture, and city construction, exert an influence creating an impressive brand image. A place brand which shows great recognition to both native people of S. Korea and foreigners creates significant economic effects. There has been research on creating a strategically and detailed place brand image, and the representative research has been carried out by Anholt who surveyed two million people from 50 different countries. However, the investigation, including survey research, required a great deal of effort from the workforce and required significant expense. As a result, there is a need to make more affordable, objective and effective research methods. The purpose of this paper is to find a way to measure the intensity of the image of the brand objective and at a low cost through text mining purposes. The proposed method extracts the keyword and the factors constructing the location brand image from the related web documents. In this way, we can measure the brand image intensity of the specific location. The performance of the proposed methodology was verified through comparison with Anholt's 50 city image consistency index ranking around the world. Four methods are applied to the test. First, RNADOM method artificially ranks the cities included in the experiment. HUMAN method firstly makes a questionnaire and selects 9 volunteers who are well acquainted with brand management and at the same time cities to evaluate. Then they are requested to rank the cities and compared with the Anholt's evaluation results. TM method applies the proposed method to evaluate the cities with all evaluation criteria. TM-LEARN, which is the extended method of TM, selects significant evaluation items from the items in every criterion. Then the method evaluates the cities with all selected evaluation criteria. RMSE is used to as a metric to compare the evaluation results. Experimental results suggested by this paper's methodology are as follows: Firstly, compared to the evaluation method that targets ordinary people, this method appeared to be more accurate. Secondly, compared to the traditional survey method, the time and the cost are much less because in this research we used automated means. Thirdly, this proposed methodology is very timely because it can be evaluated from time to time. Fourthly, compared to Anholt's method which evaluated only for an already specified city, this proposed methodology is applicable to any location. Finally, this proposed methodology has a relatively high objectivity because our research was conducted based on open source data. As a result, our city image evaluation text mining approach has found validity in terms of accuracy, cost-effectiveness, timeliness, scalability, and reliability. The proposed method provides managers with clear guidelines regarding brand management in public and private sectors. As public sectors such as local officers, the proposed method could be used to formulate strategies and enhance the image of their places in an efficient manner. Rather than conducting heavy questionnaires, the local officers could monitor the current place image very shortly a priori, than may make decisions to go over the formal place image test only if the evaluation results from the proposed method are not ordinary no matter what the results indicate opportunity or threat to the place. Moreover, with co-using the morphological analysis, extracting meaningful facets of place brand from text, sentiment analysis and more with the proposed method, marketing strategy planners or civil engineering professionals may obtain deeper and more abundant insights for better place rand images. In the future, a prototype system will be implemented to show the feasibility of the idea proposed in this paper.

A Convergence Analysis of the Ethnographic Method for Doctoral Dissertations in Korea : Focused on Research Participants, Data Collection Methods, and Trustworthiness Criteria (국내 박사학위 논문의 문화 기술적 연구방법에 대한 융복합적 분석 -연구 참여자, 자료 수집방법, 신뢰성 준거를 중심으로-)

  • Oh, Ho-young;Cho, Hong-Joong
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.10
    • /
    • pp.333-338
    • /
    • 2017
  • Ethnography is concerned about specifically-based behavior and belief and the learned pattern of language and aims to describe and interpret them. Therefore, it is a classical form of qualitative research that was developed by anthropologists who spent for long time in conducting fieldworks within the cultural group. The results of analyzing ethnographic research methods of doctoral dissertations in Korea are as follows. First, the number of research participants in data collection methods was 1-10(32 dissertations, 44.4%), 11-20(18, 25%), 21-30(13, 18.1%), 31-40(2, 2.7%), and others(7, 9.8%). Second, data collection methods were in-depth interview(71, 98.6%), participant observation(70, 97.2%), document data(38, 52.7%), engineering device(12, 16.6%), and others(8, 11.1%). Data collection periods were 3-5 months(7 dissertation, 9.8%), 6-8 months(15, 20.8%), 9-11 months(14, 19.6%), 12-14 months(13, 18.1%), more than 15 months(17, 23.6%), and unpresented(4, 5.4%). Third, trustworthiness criteria were triangulation(46 dissertation, 63.9%), research participants' evaluation of study results 44(61.1%), peer researchers' advice and indication(33, 45.8%), follow-up(25, 34.7%), use of reference(20, 27.8%), reflexive subjectivity(17, 23.6%), intensive observation for a sufficient period(10, 13.9%), in-depth description(7, 9.8%), and others(7, 9.8%).

Analysis of Evaluator's Role and Capability for Institution Accreditation Evaluation of NCS-based Vocational Competency Development Training (NCS 기반 직업능력개발훈련 기관인증평가를 위한 평가자의 역할과 역량 분석)

  • Park, Ji-Young;Lee, Hee-Su
    • Journal of vocational education research
    • /
    • v.35 no.4
    • /
    • pp.131-153
    • /
    • 2016
  • The purpose of this study was to derive evaluator's role and capability for institution accreditation evaluation of NCS-based vocational competency development training. This study attempted to explore in various ways evaluator's minute roles using Delphi method, and to derive knowledge, skill, attitude and integrity needed to verify the validity. To the end, this study conducted the Delphi research for over three rounds by selecting education training professionals and review evaluation professions as professional panels. From the results, roles of evaluators were defined as the total eight items including operator, moderator-mediator, cooperator, analyzer, verifier, institution evaluator, institution consultant, and learner, and the derived capabilities with respect to each role were 25 items in total. The area of knowledge included four items of capabilities such as HRD knowledge, NCS knowledge, knowledge of vocational competency development training, and knowledge of training institution accreditation evaluation, and the area of skill comprised fourteen items of capabilities such as conflict management ability, interpersonal relation ability, word processing ability, problem-solving ability, analysis ability, pre-preparation ability, time management ability, decision making ability, information comprehension and utilization ability, comprehensive thinking ability, understanding ability of vocational competency development training institutions, communication ability, feedback ability, and core understanding ability. The area of attitude was summarized with the seven items in total including subjectivity and fairness, service mind, sense of calling, ethics, self-development, responsibility, and teamwork. The knowledge, skill and attitude derived from the results of this study may be utilized to design and provide education programs conducive to qualitative and systematic accreditation and assessment to evaluators equipped with essential prerequisites. It is finally expected that this study will be helpful for designing module education programs by ability and for managing evaluator's quality in order to perform pre-service education and in-service education according to evaluator's experience and role.

Label Embedding for Improving Classification Accuracy UsingAutoEncoderwithSkip-Connections (다중 레이블 분류의 정확도 향상을 위한 스킵 연결 오토인코더 기반 레이블 임베딩 방법론)

  • Kim, Museong;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.175-197
    • /
    • 2021
  • Recently, with the development of deep learning technology, research on unstructured data analysis is being actively conducted, and it is showing remarkable results in various fields such as classification, summary, and generation. Among various text analysis fields, text classification is the most widely used technology in academia and industry. Text classification includes binary class classification with one label among two classes, multi-class classification with one label among several classes, and multi-label classification with multiple labels among several classes. In particular, multi-label classification requires a different training method from binary class classification and multi-class classification because of the characteristic of having multiple labels. In addition, since the number of labels to be predicted increases as the number of labels and classes increases, there is a limitation in that performance improvement is difficult due to an increase in prediction difficulty. To overcome these limitations, (i) compressing the initially given high-dimensional label space into a low-dimensional latent label space, (ii) after performing training to predict the compressed label, (iii) restoring the predicted label to the high-dimensional original label space, research on label embedding is being actively conducted. Typical label embedding techniques include Principal Label Space Transformation (PLST), Multi-Label Classification via Boolean Matrix Decomposition (MLC-BMaD), and Bayesian Multi-Label Compressed Sensing (BML-CS). However, since these techniques consider only the linear relationship between labels or compress the labels by random transformation, it is difficult to understand the non-linear relationship between labels, so there is a limitation in that it is not possible to create a latent label space sufficiently containing the information of the original label. Recently, there have been increasing attempts to improve performance by applying deep learning technology to label embedding. Label embedding using an autoencoder, a deep learning model that is effective for data compression and restoration, is representative. However, the traditional autoencoder-based label embedding has a limitation in that a large amount of information loss occurs when compressing a high-dimensional label space having a myriad of classes into a low-dimensional latent label space. This can be found in the gradient loss problem that occurs in the backpropagation process of learning. To solve this problem, skip connection was devised, and by adding the input of the layer to the output to prevent gradient loss during backpropagation, efficient learning is possible even when the layer is deep. Skip connection is mainly used for image feature extraction in convolutional neural networks, but studies using skip connection in autoencoder or label embedding process are still lacking. Therefore, in this study, we propose an autoencoder-based label embedding methodology in which skip connections are added to each of the encoder and decoder to form a low-dimensional latent label space that reflects the information of the high-dimensional label space well. In addition, the proposed methodology was applied to actual paper keywords to derive the high-dimensional keyword label space and the low-dimensional latent label space. Using this, we conducted an experiment to predict the compressed keyword vector existing in the latent label space from the paper abstract and to evaluate the multi-label classification by restoring the predicted keyword vector back to the original label space. As a result, the accuracy, precision, recall, and F1 score used as performance indicators showed far superior performance in multi-label classification based on the proposed methodology compared to traditional multi-label classification methods. This can be seen that the low-dimensional latent label space derived through the proposed methodology well reflected the information of the high-dimensional label space, which ultimately led to the improvement of the performance of the multi-label classification itself. In addition, the utility of the proposed methodology was identified by comparing the performance of the proposed methodology according to the domain characteristics and the number of dimensions of the latent label space.

Scientific Practices Manifested in Science Textbooks: Middle School Science and High School Integrated Science Textbooks for the 2015 Science Curriculum (과학 교과서에 제시된 과학실천의 빈도와 수준 -2015 개정 교육과정에 따른 중학교 과학 및 통합과학-)

  • Kang, Nam-Hwa;Lee, Hye Rim;Lee, Sangmin
    • Journal of The Korean Association For Science Education
    • /
    • v.42 no.4
    • /
    • pp.417-428
    • /
    • 2022
  • This study analyzed the frequency and level of scientific practices presented in secondary science textbooks. A total of 1,378 student activities presented in 14 middle school science textbooks and 5 high school integrated science textbooks were analyzed, using the definition and level of scientific practice suggested in the NGSS. Findings show that most student activities focus on three practices. Compared to the textbooks for the previous science curriculum, the practice of 'obtaining, evaluating, and communicating information' was more emphasized, reflecting societal changes due to ICT development. However, the practice of 'asking a question', which can be an important element of student-led science learning, was still rarely found in textbooks, and 'developing and using models', 'using math and computational thinking' and 'arguing based on evidence' were not addressed much. The practices were mostly elementary school level except for the practice of 'constructing explanations'. Such repeated exposures to a few and low level of practices mean that many future citizens would be led to a naïve understanding of science. The findings imply that it is necessary to emphasize various practices tailored to the level of students. In the upcoming revision of the science curriculum, it is necessary to provide the definition of practices that are not currently specified and the expected level of each practice so that the curriculum can provide sufficient guidance for textbook writing. These efforts should be supported by benchmarking of overseas science curriculum and research that explore students' ability and teachers' understanding of scientific practices.

Analyzing the Performance Expectations of the 2022 Revised Mathematics and Science Curriculum from a Data Visualization Competency Perspective (데이터 시각화 역량 관점에서 2022 개정 수학/과학 교육과정의 성취기준 분석)

  • Dong-Young Lee;Ae-Lyeong Park;Ju-Hee Jeong;Ju-Hyun Hwang;Youn-Kyeong Nam
    • Journal of the Korean Society of Earth Science Education
    • /
    • v.17 no.2
    • /
    • pp.123-136
    • /
    • 2024
  • This study examines the performance expectations (PEs) and clarification statements of each PE in the 2022 revised national science and mathematics education standards from a data visualization competency perspective. First, the authors intensively reviewed data visualization literature to define key competencies and developed a framework comprising four main categories: collection and pre-processing skills, technical skills, thinking skills, and interaction skills. Based on the framework, the authors extracted a total of 191 mathematics and 230 science PEs from the 2022 revised science and mathematics education standards (Ministry of Education Ordinance No. 2022-33, Volumes 8 and 9) as the main data set. The analysis process consisted of three steps: first, the authors organized the data (421 PEs) by the four categories of the framework and four grade levels (3-4th, 5-6th, 7-9th, and 10th grade); second, the numbers of PEs in each grade level were standardized by the accomplishing period (1-3 years) of each PE depending on the grade level; lastly, the data set was represented by heatmaps to visualize the relationship between the four categories of visualization competency and four grade levels, and the differences between the competency categories and grade levels were quantitatively analyzed using the Mann-Whitney U test and independent sample Kruskal-Wallis tests. The analysis results revealed that in mathematics, there was no significant difference between the number of PEs by grade. However, on average, the number of PEs categorized in 'thinking skills' was significantly lower than those in the technical skills (p = .002) and interaction skills categories (p = .001). In science, it was observed that as grade level increased, PEs also increased (pairwise comparison: Grades 5-6 vs. 7-9, p = .001; Grades 5-6 vs. Grade 10, p = .029; Grades 3-4 vs. 7-9, p = .022). Particularly, the frequency of PEs in 'thinking skills' was significantly lower than in the other skills (pairwise comparison: technical skills p = .024; collection and pre-processing skills p = .012; interaction skills p = .010). Based on the results, two implications for revising national science and mathematics standards and teacher education were suggested.