• Title/Summary/Keyword: Education Data Mining

Search Result 268, Processing Time 0.025 seconds

HTML Text Extraction Using Frequency Analysis (빈도 분석을 이용한 HTML 텍스트 추출)

  • Kim, Jin-Hwan;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.9
    • /
    • pp.1135-1143
    • /
    • 2021
  • Recently, text collection using a web crawler for big data analysis has been frequently performed. However, in order to collect only the necessary text from a web page that is complexly composed of numerous tags and texts, there is a cumbersome requirement to specify HTML tags and style attributes that contain the text required for big data analysis in the web crawler. In this paper, we proposed a method of extracting text using the frequency of text appearing in web pages without specifying HTML tags and style attributes. In the proposed method, the text was extracted from the DOM tree of all collected web pages, the frequency of appearance of the text was analyzed, and the main text was extracted by excluding the text with high frequency of appearance. Through this study, the superiority of the proposed method was verified.

Exploration of relationship between confirmation measures and association thresholds (기준 확인 측도와 연관성 평가기준과의 관계 탐색)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.4
    • /
    • pp.835-845
    • /
    • 2013
  • Association rule of data mining techniques is the method to quantify the relevance between a set of items in a big database, andhas been applied in various fields like manufacturing industry, shopping mall, healthcare, insurance, and education. Philosophers of science have proposed interestingness measures for various kinds of patterns, analyzed their theoretical properties, evaluated them empirically, and suggested strategies to select appropriate measures for particular domains and requirements. Such interestingness measures are divided into objective, subjective, and semantic measures. Objective measures are based on data used in the discovery process and are typically motivated by statistical considerations. Subjective measures take into account not only the data but also the knowledge and interests of users who examine the pattern, while semantic measures additionally take into account utility and actionability. In a very different context, researchers have devoted a lot of attention to measures of confirmation or evidential support. The focus in this paper was on asymmetric confirmation measures, and we compared confirmation measures with basic association thresholds using some simulation data. As the result, we could distinguish the direction of association rule by confirmation measures, and interpret degree of association operationally by them. Futhermore, the result showed that the measure by Rips and that by Kemeny and Oppenheim were better than other confirmation measures.

Research on Outlier and Missing Value Correction Methods to Improve Smart Farm Data Quality (스마트팜 데이터 품질 향상을 위한 이상치 및 결측치 보정 방법에 관한 연구)

  • Sung-Jae Lee;Hyun Sim
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.5
    • /
    • pp.1027-1034
    • /
    • 2024
  • This study aims to address the issues of outliers and missing values in AI-based smart farming to improve data quality and enhance the accuracy of agricultural predictive activities. By utilizing real data provided by the Rural Development Administration (RDA) and the Korea Agency of Education, Promotion, and Information Service in Food, Agriculture, Forestry, and Fisheries (EPIS), outlier detection and missing value imputation techniques were applied to collect and manage high-quality data. For successful smart farm operations, an IoT-based AI automatic growth measurement model is essential, and achieving a high data quality index through stable data preprocessing is crucial. In this study, various methods for correcting outliers and imputing missing values in growth data were applied, and the proposed preprocessing strategies were validated using machine learning performance evaluation indices. The results showed significant improvements in model performance, with high predictive accuracy observed in key evaluation metrics such as ROC and AUC.

Statistical Profiles of Users' Interactions with Videos in Large Repositories: Mining of Khan Academy Repository

  • Yassine, Sahar;Kadry, Seifedine;Sicilia, Miguel Angel
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.5
    • /
    • pp.2101-2121
    • /
    • 2020
  • The rapid growth of instructional videos repositories and their widespread use as a tool to support education have raised the need of studies to assess the quality of those educational resources and their impact on the quality of learning process that depends on them. Khan Academy (KA) repository is one of the prominent educational videos' repositories. It is famous and widely used by different types of learners, students and teachers. To better understand its characteristics and the impact of such repositories on education, we gathered a huge amount of KA data using its API and different web scraping techniques, then we analyzed them. This paper reports the first quantitative and descriptive analysis of Khan Academy repository (KA repository) of open video lessons. First, we described the structure of repository. Then, we demonstrated some analyses highlighting content-based growth and evolution. Those descriptive analyses spotted the main important findings in KA repository. Finally, we focused on users' interactions with video lessons. Those interactions consisted of questions and answers posted on videos. We developed interaction profiles for those videos based on the number of users' interactions. We conducted regression analysis and statistical tests to mine the relation between those profiles and some quality related proposed metrics. The results of analysis showed that all interaction profiles are highly affected by video length and reuse rate in different subjects. We believe that our study demonstrated in this paper provides valuable information in understanding the logic and the learning mechanism inside learning repositories, which can have major impacts on the education field in general, and particularly on the informal learning process and the instructional design process. This study can be considered as one of the first quantitative studies to shed the light on Khan Academy as an open educational resources (OER) repository. The results presented in this paper are crucial in understanding KA videos repository, its characteristics and its impact on education.

Analyzing Students' Non-face-to-face Course Evaluation by Topic Modeling and Developing Deep Learning-based Classification Model (토픽 모델링 기반 비대면 강의평 분석 및 딥러닝 분류 모델 개발)

  • Han, Ji Yeong;Heo, Go Eun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.55 no.4
    • /
    • pp.267-291
    • /
    • 2021
  • Due to the global pandemic caused by COVID-19 in 2020, there have been major changes in the education sites. Universities have fully introduced remote learning, which was considered as an auxiliary education, and non-face-to-face classes have become commonplace, and professors and students are making great efforts to adapt to the new educational environment. In order to improve the quality of non-face-to-face lectures amid these changes, it is necessary to study the factors affecting lecture satisfaction. Therefore, This paper presents a new methodology using big data to identify the factors affecting university lecture satisfaction changed before and after COVID-19. We use Topic Modeling method to analyze lecture reviews before and after COVID-19, and identify factors affecting lecture satisfaction. Through this, we suggest the direction for university education to move forward. In addition, we can identify the factors of satisfaction and dissatisfaction of lectures from multiangle by establishing a topic classification model with an F1-score of 0.84 based on KoBERT, a deep learning language model, and further contribute to continuous qualitative improvement of lecture satisfaction.

Decision-Tree Model of Long-term Abstention from Smoking: Focused on Coping Styles (장기적 금연 지속기간 예측 모형: 스트레스 대처를 중심으로)

  • Suh, Kyung-Hyun;You, Jae-Min
    • Korean Journal of Health Education and Promotion
    • /
    • v.22 no.4
    • /
    • pp.73-90
    • /
    • 2005
  • Objectives: Smokers who had failed to quit smoking were frequently reported that life stress mostly interrupted their abstention. Stress vulnerability model for smoking cessation has been considered, and most of contemporary smoking cessation programs help smokers develop coping strategies for stressful situations. This study aims to investigate the appropriate coping styles for stress of abstention from smoking. The result of investigating the relationship between abstention following smoking cessation program and coping styles would suggest useful information for those who want to stop smoking and health practitioners who help them. Methods: Participants were 69 smokers (62 males, 7 females) participated in a hospitalized smoking cessation program, whose mean age was 44.89 (SD=9.61). Participants took medical test and completed questionnaires and psychological tests including: Fagerstrom Test for Nicotine Dependence and Multidimensional Coping Scale. To identify participants' abstention, researchers followed them for 2 years. To identify whether abstained or not and encourage them to abstain, researchers called them on the telephone once a week for 3 months. After 3 months, they were contacted every other week till 6 months passed since they left smoking cessation program. And they were contacted once a month for other 18months. Researchers also contacted their family to identify their abstention. Data Mining Decision Tree was performed with 37 variables (13 variables for the coping styles and 24 smoking-related variables) by Answer Tree 3.0v Results: Forty four (63.8%) out of sixty nine for 2 weeks, 34 (49.3%) for 6 months, 25 (36.2%) abstained for 1 year, and 22 (31.9%) abstained for 2 years. Participants of this study abstained average of 286.77 days from smoking. Included variables of a Decision Tree model for this study were positive interpretation, emotional expression, self-criticism, restraint and emotional social support seeking. Decision Tree model showed that those (n=9) who did not interpret positively (<=7.5) and criticized themselves (>6.5) abstained 23 days only, while those (n=9) who interpreted positively (>7.5), expressed their emotion freely (>6.5), and sought social support actively (>11.5) abstained 730 days, till last day of the investigation. Conclusion: The results of this study showed that certain coping styles such as positive interpretation, emotional expression, self-criticism, restraint and emotional social support seeking were important factors for long-term abstention from smoking. These findings reiterate the role of stress for abstention from smoking and suggest a model of coping styles for successful abstention from smoking. Despite of limitation of this study, it might help smokers who want to stop smoking and health practitioners who help them.

Trend Analysis in Maker Movement Using Text Mining (텍스트 마이닝을 이용한 메이커 운동의 트렌드 분석)

  • Park, Chanhyuk;Kim, Ja-Hee
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.12
    • /
    • pp.468-488
    • /
    • 2018
  • The maker movement is a phenomenon of society and culture where people who make necessary things come together and share knowledge and experience through creativity. However, as the maker movement has grown rapidly over the past decade, there is still a lack of consensus for how far they will be viewed as a maker movement. We need to look at how the maker movement has changed so far in order to find the direction of development of the maker movement. This study analyzes the media articles using text-based big data analysis methodology to understand how the issue of the maker movement has changed in general media. In particular, we apply Keyword Network Analysis and DTM(Dynamic Topic Model) to analyze changes of interest according to time. The Keyword Network Analysis derives major keywords at the word level in order to analyze the evolution of the maker movement, and DTM helps to identify changes in interest in different areas of the maker movement at three levels: word, topic, and document. As a result, we identified major topics such as start-ups, makerspaces, and maker education, and the major keywords have changed from 3D printer and enterprise to education.

A Prediction of Stock Price Through the Big-data Analysis (인터넷 뉴스 빅데이터를 활용한 기업 주가지수 예측)

  • Yu, Ji Don;Lee, Ik Sun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.41 no.3
    • /
    • pp.154-161
    • /
    • 2018
  • This study conducted to predict the stock market prices based on the assumption that internet news articles might have an impact and effect on the rise and fall of stock market prices. The internet news articles were tested to evaluate the accuracy by comparing predicted values of the actual stock index and the forecasting models of the companies. This paper collected stock news from the internet, and analyzed and identified the relationship with the stock price index. Since the internet news contents consist mainly of unstructured texts, this study used text mining technique and multiple regression analysis technique to analyze news articles. A company H as a representative automobile manufacturing company was selected, and prediction models for the stock price index of company H was presented. Thus two prediction models for forecasting the upturn and decline of H stock index is derived and presented. Among the two prediction models, the error value of the prediction model (1) is low, and so the prediction performance of the model (1) is relatively better than that of the prediction model (2). As the further research, if the contents of this study are supplemented by real artificial intelligent investment decision system and applied to real investment, more practical research results will be able to be developed.

Minimally Supervised Relation Identification from Wikipedia Articles

  • Oh, Heung-Seon;Jung, Yuchul
    • Journal of Information Science Theory and Practice
    • /
    • v.6 no.4
    • /
    • pp.28-38
    • /
    • 2018
  • Wikipedia is composed of millions of articles, each of which explains a particular entity with various languages in the real world. Since the articles are contributed and edited by a large population of diverse experts with no specific authority, Wikipedia can be seen as a naturally occurring body of human knowledge. In this paper, we propose a method to automatically identify key entities and relations in Wikipedia articles, which can be used for automatic ontology construction. Compared to previous approaches to entity and relation extraction and/or identification from text, our goal is to capture naturally occurring entities and relations from Wikipedia while minimizing artificiality often introduced at the stages of constructing training and testing data. The titles of the articles and anchored phrases in their text are regarded as entities, and their types are automatically classified with minimal training. We attempt to automatically detect and identify possible relations among the entities based on clustering without training data, as opposed to the relation extraction approach that focuses on improvement of accuracy in selecting one of the several target relations for a given pair of entities. While the relation extraction approach with supervised learning requires a significant amount of annotation efforts for a predefined set of relations, our approach attempts to discover relations as they occur naturally. Unlike other unsupervised relation identification work where evaluation of automatically identified relations is done with the correct relations determined a priori by human judges, we attempted to evaluate appropriateness of the naturally occurring clusters of relations involving person-artifact and person-organization entities and their relation names.

Estimation of splitting tensile strength of modified recycled aggregate concrete using hybrid algorithms

  • Zhu, Yirong;Huang, Lihua;Zhang, Zhijun;Bayrami, Behzad
    • Steel and Composite Structures
    • /
    • v.44 no.3
    • /
    • pp.389-406
    • /
    • 2022
  • Recycling concrete construction waste is an encouraging step toward green and sustainable building. A lot of research has been done on recycled aggregate concretes (RACs), but not nearly as much has been done on concrete made with recycled aggregate. Recycled aggregate concrete, on the other hand, has been found to have a lower mechanical productivity compared to conventional one. Accurately estimating the mechanical behavior of the concrete samples is a most important scientific topic in civil, structural, and construction engineering. This may prevent the need for excess time and effort and lead to economic considerations because experimental studies are often time-consuming, costly, and troublous. This study presents a comprehensive data-mining-based model for predicting the splitting tensile strength of recycled aggregate concrete modified with glass fiber and silica fume. For this purpose, first, 168 splitting tensile strength tests under different conditions have been performed in the laboratory, then based on the different conditions of each experiment, some variables are considered as input parameters to predict the splitting tensile strength. Then, three hybrid models as GWO-RF, GWO-MLP, and GWO-SVR, were utilized for this purpose. The results showed that all developed GWO-based hybrid predicting models have good agreement with measured experimental results. Significantly, the GWO-RF model has the best accuracy based on the model performance assessment criteria for training and testing data.