• Title/Summary/Keyword: Gender Prediction

Search Result 125, Processing Time 0.022 seconds

Predicting Students' Engagement in Online Courses Using Machine Learning

  • Alsirhani, Jawaher;Alsalem, Khalaf
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.9
    • /
    • pp.159-168
    • /
    • 2022
  • No one denies the importance of online courses, which provide a very important alternative, especially for students who have jobs that prevent them from attending face-to-face in traditional classes; Engagement is one of the most important fundamental variables that indicate the course's success in achieving its objectives. Therefore, the current study aims to build a model using machine learning to predict student engagement in online courses. An online questionnaire was prepared and applied to the students of Jouf University in the Kingdom of Saudi Arabia, and data was obtained from the input variables in the questionnaire, which are: specialization, gender, academic year, skills, emotional aspects, participation, performance, and engagement in the online course as a dependent variable. Multiple regression was used to analyze the data using SPSS. Kegel was used to build the model as a machine learning technique. The results indicated that there is a positive correlation between the four variables (skills, emotional aspects, participation, and performance) and engagement in online courses. The model accuracy was very high 99.99%, This shows the model's ability to predict engagement in the light of the input variables.

Context Prediction Using Right and Wrong Patterns to Improve Sequential Matching Performance for More Accurate Dynamic Context-Aware Recommendation (보다 정확한 동적 상황인식 추천을 위해 정확 및 오류 패턴을 활용하여 순차적 매칭 성능이 개선된 상황 예측 방법)

  • Kwon, Oh-Byung
    • Asia pacific journal of information systems
    • /
    • v.19 no.3
    • /
    • pp.51-67
    • /
    • 2009
  • Developing an agile recommender system for nomadic users has been regarded as a promising application in mobile and ubiquitous settings. To increase the quality of personalized recommendation in terms of accuracy and elapsed time, estimating future context of the user in a correct way is highly crucial. Traditionally, time series analysis and Makovian process have been adopted for such forecasting. However, these methods are not adequate in predicting context data, only because most of context data are represented as nominal scale. To resolve these limitations, the alignment-prediction algorithm has been suggested for context prediction, especially for future context from the low-level context. Recently, an ontological approach has been proposed for guided context prediction without context history. However, due to variety of context information, acquiring sufficient context prediction knowledge a priori is not easy in most of service domains. Hence, the purpose of this paper is to propose a novel context prediction methodology, which does not require a priori knowledge, and to increase accuracy and decrease elapsed time for service response. To do so, we have newly developed pattern-based context prediction approach. First of ail, a set of individual rules is derived from each context attribute using context history. Then a pattern consisted of results from reasoning individual rules, is developed for pattern learning. If at least one context property matches, say R, then regard the pattern as right. If the pattern is new, add right pattern, set the value of mismatched properties = 0, freq = 1 and w(R, 1). Otherwise, increase the frequency of the matched right pattern by 1 and then set w(R,freq). After finishing training, if the frequency is greater than a threshold value, then save the right pattern in knowledge base. On the other hand, if at least one context property matches, say W, then regard the pattern as wrong. If the pattern is new, modify the result into wrong answer, add right pattern, and set frequency to 1 and w(W, 1). Or, increase the matched wrong pattern's frequency by 1 and then set w(W, freq). After finishing training, if the frequency value is greater than a threshold level, then save the wrong pattern on the knowledge basis. Then, context prediction is performed with combinatorial rules as follows: first, identify current context. Second, find matched patterns from right patterns. If there is no pattern matched, then find a matching pattern from wrong patterns. If a matching pattern is not found, then choose one context property whose predictability is higher than that of any other properties. To show the feasibility of the methodology proposed in this paper, we collected actual context history from the travelers who had visited the largest amusement park in Korea. As a result, 400 context records were collected in 2009. Then we randomly selected 70% of the records as training data. The rest were selected as testing data. To examine the performance of the methodology, prediction accuracy and elapsed time were chosen as measures. We compared the performance with case-based reasoning and voting methods. Through a simulation test, we conclude that our methodology is clearly better than CBR and voting methods in terms of accuracy and elapsed time. This shows that the methodology is relatively valid and scalable. As a second round of the experiment, we compared a full model to a partial model. A full model indicates that right and wrong patterns are used for reasoning the future context. On the other hand, a partial model means that the reasoning is performed only with right patterns, which is generally adopted in the legacy alignment-prediction method. It turned out that a full model is better than a partial model in terms of the accuracy while partial model is better when considering elapsed time. As a last experiment, we took into our consideration potential privacy problems that might arise among the users. To mediate such concern, we excluded such context properties as date of tour and user profiles such as gender and age. The outcome shows that preserving privacy is endurable. Contributions of this paper are as follows: First, academically, we have improved sequential matching methods to predict accuracy and service time by considering individual rules of each context property and learning from wrong patterns. Second, the proposed method is found to be quite effective for privacy preserving applications, which are frequently required by B2C context-aware services; the privacy preserving system applying the proposed method successfully can also decrease elapsed time. Hence, the method is very practical in establishing privacy preserving context-aware services. Our future research issues taking into account some limitations in this paper can be summarized as follows. First, user acceptance or usability will be tested with actual users in order to prove the value of the prototype system. Second, we will apply the proposed method to more general application domains as this paper focused on tourism in amusement park.

The Role of Anomalous Data in Concept Learning (개념 학습에서 변칙 사례의 역할)

  • Noh, Tae-Hee;Jeong, Eun-Hee;Kang, Suk-Jin;Han, Jae-Young
    • Journal of The Korean Association For Science Education
    • /
    • v.22 no.3
    • /
    • pp.586-594
    • /
    • 2002
  • In this study, the relationships among cognitive conflict, situational interest, and conceptual change in studying boiling point were investigated. The differences in the relationships by gender were also investigate. Students of 7th grade(N=370) participated in this study. First, a preconception test was administered to choose students who possessed the misconception studied. After presenting anomalous data, test of response to anomalous data and state interest test were administered. After the instruction with a CAI program, a conception test was administered immediately. The conception test was administered again as a retention test four weeks later. The scores of both cognitive conflicts and state interest test were found to be significantly correlated with the scores of the conception test and the retention test. The results of multiple regression analysis indicated that state interest was significantly more important than cognitive conflict in prediction the degrees of conceptual change and retention of conception. For male students, state interest was the only significant predictor of conceptual change and retention of conception. In contrast, cognitive conflict was the only significant predictor for female students.

Bayesian spatial analysis of obesity proportion data (비만율 자료에 대한 베이지안 공간 분석)

  • Choi, Jungsoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.5
    • /
    • pp.1203-1214
    • /
    • 2016
  • Obesity is a risk factor for various diseases as well as itself a disease and associated with socioeconomic factors. The obesity proportion has been increasing in Korea over about 15 years so that investigation of the socioeconomic factors related with obesity is important in terms of preventation of obesity. In particular, the association between obesity and socioeconomic status varies with gender and has spatial dependency. In the paper, we estimate the effects of socioeconomic factors on obesity proportion by gender, considering the spatial correlation. Here, a conditional autoregressive model under the Bayesian framework is used in order to take into account the spatial dependency. For the real applicaiton, we use the obestiy proportion dataset at 25 districts of Seoul in 2010. We compare the proposed spatial model with a non-spatial model in terms of the goodness-of-fit and prediction measures so the spatial model performs well.

A study of prosodic features of patients with idiopathic Parkinson's disease (파킨슨병 환자와 정상노인 간의 문장 읽기에 나타난 운율 특성 비교)

  • Kang, Young-Ae;Seong, Cheol-Jae;Yoon, Kyu-Chul
    • Phonetics and Speech Sciences
    • /
    • v.3 no.1
    • /
    • pp.145-151
    • /
    • 2011
  • In view of the hypothesis that the effects of Parkinson's disease on voice production can be detected before pharmacological intervention, the prosodic features of patients with idiopathic Parkinson's disease (IPD) and a healthy aging group were diagnostically analyzed with the long term object of establishing, for clinical purposes, early disease-progression biomarkers. Twenty patients (male 8; female 12) with IPD (prior to pharmacological intervention) and a healthy control group of 22 (male 10; female 12) were selected. Ten sentences were recorded with a head-worn microphone. One sentence was chosen for the analysis of this paper. Relevant parameters, i.e. 3-dimensional model (F0, intensity, duration) and pitch and intensity related slopes (maxEnergy, maxF0, meanAbS, semiT, meanEnergy, meanF0), were analyzed by two-group discriminant analysis. The stepwise estimation method of discriminant analysis was performed by gender. The discriminant functions predicted 83.9% of the male test data correctly while the prediction rate was 93.1% for the female group. The results showed that meanF0_slope and semiT_slope were more important parameters than the others for the male group. For the female group, the meanEnergy_slope and maxEnergy_slope were the important ones. These findings indicate that significant parameters are different for the male and female group. Gender lifestyle may be responsible for this difference. Dysprosodic features of IPD show not simultaneously but progressively in terms of F0, intensity and duration.

  • PDF

A Study on Development of the Korea Agricultural Population Forecasting Model and long-term Prediction (농가인구예측 모형 개발 및 중장기 전망)

  • Han, Suk-Ho
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.6
    • /
    • pp.3797-3806
    • /
    • 2015
  • A population decline in rural area is correlated with the number of household, with agricultural workers, as a result, affects the farming income. Agricultural population is a foundation of agriculture structure. Agricultural population decline influences agricultural policies to be implemented for the future and there is concern about slowdown in productivity. The purpose of this study is to build the ability to use the model and conduct applied analyses of various kinds and to make rational agricultural policies by forecasting and analyzing agricultural population change. Unlike previous studies, which have some assumptions about the giving-up farming rate (GFR) of the key points on the agricultural population model or, After estimating only one equation with respect to the total population, and then distribute by sex and age. This study was conducted to investigate the reactions are different from the farmhouse, gender, age by estimating giving-up farming rate (GFR) equations each gender & age. Through this research, we can find that Farm Population changes of the simulation can be performed for a variety of agricultural policy in conjunction with existing agricultural simulation models as well.

Convergence study to detect metabolic syndrome risk factors by gender difference (성별에 따른 대사증후군의 위험요인 탐색을 위한 융복합 연구)

  • Lee, So-Eun;Rhee, Hyun-Sill
    • Journal of Digital Convergence
    • /
    • v.19 no.12
    • /
    • pp.477-486
    • /
    • 2021
  • This study was conducted to detect metabolic syndrome risk factors and gender difference in adults. 18,616 cases of adults are collected by Korea Health and Nutrition Examination Study from 2016 to 2019. Using 4 types of machine Learning(Logistic Regression, Decision Tree, Naïve Bayes, Random Forest) to predict Metabolic Syndrome. The results showed that the Random Forest was superior to other methods in men and women. In both of participants, BMI, diet(fat, vitamin C, vitamin A, protein, energy intake), number of underlying chronic disease and age were the upper importance. In women, education level, menarche age, menopause was additional upper importance and age, number of underlying chronic disease were more powerful importance than men. Future study have to verify various strategy to prevent metabolic syndrome.

Prediction of non-exercise activity thermogenesis (NEAT) using multiple linear regression in healthy Korean adults: a preliminary study

  • Jung, Won-Sang;Park, Hun-Young;Kim, Sung-Woo;Kim, Jisu;Hwang, Hyejung;Lim, Kiwon
    • Korean Journal of Exercise Nutrition
    • /
    • v.25 no.1
    • /
    • pp.23-29
    • /
    • 2021
  • [Purpose] This preliminary study aimed to develop a regression model to estimate the non-exercise activity thermogenesis (NEAT) of Korean adults using various easy-to-measure dependent variables. [Methods] NEAT was measured in 71 healthy adults (male n = 29; female n = 42). Statistical analysis was performed to develop a NEAT estimation regression model using the stepwise regression method. [Results] We confirmed that ageA, weightB, heart rate (HR)_averageC, weight × HR_averageD, weight × HR_sumE, systolic blood pressure (SBP) × HR_restF, fat mass ÷ height2G, gender × HR_averageH, and gender × weight × HR_sumI were important variables in various NEAT activity regression models. There was no significant difference between the measured NEAT values obtained using a metabolic gas analyzer and the predicted NEAT. [Conclusion] This preliminary study developed a regression model to estimate the NEAT in healthy Korean adults. The regression model was as follows: sitting = 1.431 - 0.013 × (A) + 0.00014 × (D) - 0.00005 × (F) + 0.006 × (H); leg jiggling = 1.102 - 0.011 × (A) + 0.013 × (B) + 0.005 × (H); standing = 1.713 - 0.013 × (A) + 0.0000017 × (I); 4.5 km/h walking = 0.864 + 0.035 × (B) + 0.0000041 × (E); 6.0 km/h walking = 4.029 - 0.024 × (C) + 0.00071 × (D); climbing up 1 stair = 1.308 - 0.016 × (A) + 0.00035 × (D) - 0.000085 × (F) - 0.098 × (G); and climbing up 2 stairs = 1.442 - 0.023 × (A) - 0.000093 × (F) - 0.121 × (G) + 0.0000624 × (E).

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

Development of a Prediction Model for Fall Patients in the Main Diagnostic S Code Using Artificial Intelligence (인공지능을 이용한 주진단 S코드의 낙상환자 예측모델 개발)

  • Ye-Ji Park;Eun-Mee Choi;So-Hyeon Bang;Jin-Hyoung Jeong
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.6
    • /
    • pp.526-532
    • /
    • 2023
  • Falls are fatal accidents that occur more than 420,000 times a year worldwide. Therefore, to study patients with falls, we found the association between extrinsic injury codes and principal diagnosis S-codes of patients with falls, and developed a prediction model to predict extrinsic injury codes based on the data of principal diagnosis S-codes of patients with falls. In this study, we received two years of data from 2020 and 2021 from Institution A, located in Gangneung City, Gangwon Special Self-Governing Province, and extracted only the data from W00 to W19 of the extrinsic injury codes related to falls, and developed a prediction model using W01, W10, W13, and W18 of the extrinsic injury codes of falls, which had enough principal diagnosis S-codes to develop a prediction model. 80% of the data were categorized as training data and 20% as testing data. The model was developed using MLP (Multi-Layer Perceptron) with 6 variables (gender, age, principal diagnosis S-code, surgery, hospitalization, and alcohol consumption) in the input layer, 2 hidden layers with 64 nodes, and an output layer with 4 nodes for W01, W10, W13, and W18 exogenous damage codes using the softmax activation function. As a result of the training, the first training had an accuracy of 31.2%, but the 30th training had an accuracy of 87.5%, which confirmed the association between the fall extrinsic code and the main diagnosis S code of the fall patient.