• Title/Summary/Keyword: gender prediction

Search Result 129, Processing Time 0.022 seconds

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

Development of a Prediction Model for Fall Patients in the Main Diagnostic S Code Using Artificial Intelligence (인공지능을 이용한 주진단 S코드의 낙상환자 예측모델 개발)

  • Ye-Ji Park;Eun-Mee Choi;So-Hyeon Bang;Jin-Hyoung Jeong
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.16 no.6
    • /
    • pp.526-532
    • /
    • 2023
  • Falls are fatal accidents that occur more than 420,000 times a year worldwide. Therefore, to study patients with falls, we found the association between extrinsic injury codes and principal diagnosis S-codes of patients with falls, and developed a prediction model to predict extrinsic injury codes based on the data of principal diagnosis S-codes of patients with falls. In this study, we received two years of data from 2020 and 2021 from Institution A, located in Gangneung City, Gangwon Special Self-Governing Province, and extracted only the data from W00 to W19 of the extrinsic injury codes related to falls, and developed a prediction model using W01, W10, W13, and W18 of the extrinsic injury codes of falls, which had enough principal diagnosis S-codes to develop a prediction model. 80% of the data were categorized as training data and 20% as testing data. The model was developed using MLP (Multi-Layer Perceptron) with 6 variables (gender, age, principal diagnosis S-code, surgery, hospitalization, and alcohol consumption) in the input layer, 2 hidden layers with 64 nodes, and an output layer with 4 nodes for W01, W10, W13, and W18 exogenous damage codes using the softmax activation function. As a result of the training, the first training had an accuracy of 31.2%, but the 30th training had an accuracy of 87.5%, which confirmed the association between the fall extrinsic code and the main diagnosis S code of the fall patient.

Prediction of Time to Recurrence and Influencing Factors for Gastric Cancer in Iran

  • Roshanaei, Ghodratollah;Ghannad, Masoud Sabouri;Safari, Maliheh;Sadighi, Sanambar
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.6
    • /
    • pp.2639-2642
    • /
    • 2012
  • Background: The patterns of gastric cancer recurrence vary across societies. We designed the current study in an attempt to evaluate and reveal the outbreak of the recurrence patterns of gastric cancer and also prediction of time to recurrence and its effected factors in Iran. Materials and Methods: This research was performed from March 2003 to February 2007. Demographic characteristics, clinical and pathological diagnosis and classification including pathologic stage, tumor grade, tumor site and tumor size in of patients with GC recurrent were collected from patients' data files. To evaluate of factors affected on the relapse of the GC patients, gender, age at diagnosis, treatment type and Hgb were included in the research. Data were analyzed using Kaplan-Meier and logistic regression models. Results: After treatment, 82 patients suffered recurrence, 42, 33 and 17 by the ends of first, second and third years. The mean ( SD) and median ( IQR) time to recurrence in patients with GC were 25.5 (20.6-30.1) and 21.5 (15.6-27.1) months, respectively. The results of multivariate analysis logistic regression showed that only pathologic stage, tumor grade and tumor site significantly affected the recurrence. Conclusions: We found that pathologic stage, tumor grade and tumor site significantly affect on the recurrence of GC which has a high positive prognostic value and might be functional for better follow-up and selecting the patients at risk. We also showed time to recurrence to be an important factor for follow-up of patients.

The Factors of Participating in a Smoking Cessation Program using Integrated Method of Decision Tree and Neural Network Algorithm (인공신경망 분석과 결정트리 융합에 의한 금연 프로그램 참여 결정 요인)

  • Byeon, Haewon
    • Journal of the Korea Convergence Society
    • /
    • v.6 no.2
    • /
    • pp.25-30
    • /
    • 2015
  • The purpose of this study was to analyze the factors that affects the participating in a smoking cessation program. Data were from the A Study on the Seoul Welfare Panel Study 2010. Subjects were 1,326 smokers aged 19 and older living in the community. Dependent variable was defined as experience of smoking cessation. Explanatory variables were included as age, gender, level of education, employment status, household income, marital status, drinking, self-reported health status, depression, disease, and physical activity. A prediction model was developed by the use of a Decision Tree and Neural Network Algorithm. In the Prediction model, self reported health status, disease, income, household income were significantly associated with participating in a smoking cessation program. Based this study, systematic education and development of programs are required.

Association Prediction Method Using Correlation Analysis between Fine Dust and Medical Subjects (미세먼지와 진료과목의 상관관계 분석을 통한 연관성 예측 방법)

  • Lim, Myung Jin;Kim, Seon Mi;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.7 no.3
    • /
    • pp.22-28
    • /
    • 2018
  • Air pollution problems in Korea are gradually becoming a higher concern due to various reasons such as fine dust, causing anxiety among people with regard to their health. Although various studies have been carried out on the relationship between the influence of fine dust and a certain disease, they are mostly focusing on the analyzation that fine dust is related to specific illnesses such as respiratory and cardiovascular diseases, hypertension and diabetes. In this paper, we utilize the public data of medical history information to extract ten medical care subjects with the highest number of monthly care in 2016, and analyze the relation of fine dust with certain medical subjects using Pearson correlation coefficient. We also subdivide and analyze the correlation between fine dust and the medical subjects according to their gender and age. Middle-aged Female group with the strongest positive correlation between fine dust and the medical subjects is analyzed with the correlation from 2011 to 2015, with its relevance coefficient extracted by regression analysis in order to predict the correlation with the medical subjects according to the fine dust concentration.

Investigating Non-Laboratory Variables to Predict Diabetic and Prediabetic Patients from Electronic Medical Records Using Machine Learning

  • Mukhtar, Hamid;Al Azwari, Sana
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.9
    • /
    • pp.19-30
    • /
    • 2021
  • Diabetes Mellitus (DM) is one of common chronic diseases leading to severe health complications that may cause death. The disease influences individuals, community, and the government due to the continuous monitoring, lifelong commitment, and the cost of treatment. The World Health Organization (WHO) considers Saudi Arabia as one of the top 10 countries in diabetes prevalence across the world. Since most of the medical services are provided by the government, the cost of the treatment in terms of hospitals and clinical visits and lab tests represents a real burden due to the large scale of the disease. The ability to predict the diabetic status of a patient without the laboratory tests by performing screening based on some personal features can lessen the health and economic burden caused by diabetes alone. The goal of this paper is to investigate the prediction of diabetic and prediabetic patients by considering factors other than the laboratory tests, as required by physicians in general. With the data obtained from local hospitals, medical records were processed to obtain a dataset that classified patients into three classes: diabetic, prediabetic, and non-diabetic. After applying three machine learning algorithms, we established good performance for accuracy, precision, and recall of the models on the dataset. Further analysis was performed on the data to identify important non-laboratory variables related to the patients for diabetes classification. The importance of five variables (gender, physical activity level, hypertension, BMI, and age) from the person's basic health data were investigated to find their contribution to the state of a patient being diabetic, prediabetic or normal. Our analysis presented great agreement with the risk factors of diabetes and prediabetes stated by the American Diabetes Association (ADA) and other health institutions worldwide. We conclude that by performing class-specific analysis of the disease, important factors specific to Saudi population can be identified, whose management can result in controlling the disease. We also provide some recommendations learnt from this research.

Analysis on Predictive Factors of Digital Accessibility Level of Middle-old Age Group: Focused on Gender Difference (중고령자의 디지털정보접근수준 예측요인 분석 : 성별차이를 중심으로)

  • Kim, Su-Kyoung;Shin, Hye-Ri;Kim, Young-Sun
    • Informatization Policy
    • /
    • v.27 no.1
    • /
    • pp.55-71
    • /
    • 2020
  • Digital accessibility of the middle-aged and elderly has been increasing at a faster pace than other groups such as the handicapped and adolescents. However, studies related to the digital accessibility of middle and older adults are scarce. In order to examine variables affecting accessibility to digital information of the middle-aged and elderly people, this study researches the impacts of sociodemographic, physical and mental health and social activity variables on the accessibility of digital information. We analyzed data of 1,661 people between the ages of 55 and 84 from the 2018 Status Survey on Digital Divide conducted by the National Information Society Agency. The hierarchical multiple regression analysis shows the higher education, economic, and life satisfaction levels are, the higher digital accessibility levels of both male and female are. The result of the analysis also shows that the aged male has a higher accessibility level when he does not live alone; meanwhile, the aged female has higher digital capability as her age is lower, which describes that there are differences between gender. We expect the result of this study to be used as an important reference to understand factors related to digital accessibility level and active intervention for improving digital accessibility of the middle-aged and elderly male and female.

A Study on the Cigarette price increases induced changes in Smoking rate and Smoking cessation plan (담배가격 인상에 따른 흡연율 및 금연계획의 변화)

  • Soo-Bok Lee;Jeong-An Seo
    • Journal of the Health Care and Life Science
    • /
    • v.10 no.2
    • /
    • pp.295-303
    • /
    • 2022
  • The purpose of this study is to investigate the changes in smoking rates and smoking cessation plans before and after the cigarette price increases in 2015. Therefore, based on the National Health and Nutrition Survey, this study analyzes the correlation of the change in smoking rate and cessation plans with sociological variables (gender, age, income quintile, occupation, education level, hypertension, diabetes) and health behaviors (drinking, stress perception, obesity) in 2013 before the cigarette price increases and in 2015 and then in 2017. Results indicated that the smoking rate in 2013 was 23.3%, the smoking rate in 2015 was 20.5%, and the smoking rate in 2017 was 21.0%, indicating that the smoking rate decreased compared to before the cigarette price was raised. Among the sociological variables, the cigarette price increases showed a difference in the smoking rate of income, occupation, and education level, and health behavior was found to have no significant effect on smoking rate. In addition, the cigarette price increases showed a temporary effect on the increase in the smoking cessation plan, but the increase in the smoking cessation plan did not necessarily lead to decrease the smoking rate. Therefore, in the future, efforts will be needed at the national level to provide customized smoking cessation programs by gender, age, and social factors so that the smoking cessation plan can lead to decrease the smoking rate. In addition, Research on health behaviors that were not identifited in this study should also be conducted. We hope that this study will help the prediction of the impact of smoking rate in case the price increases policies are considered or implemented.

Influences of Level of Alcohol Consumption and Motives for Drinking on Drinking Permissiveness in University Students (대학생의 음주 정도, 음주 동기가 음주 허용도에 미치는 영향)

  • Kim, Jong-Im;Kim, Jong-Sung;Kim, Ji-Su;Kim, Kyung-Hee
    • Journal of Korean Academy of Fundamentals of Nursing
    • /
    • v.14 no.3
    • /
    • pp.382-390
    • /
    • 2007
  • Purpose: This study was done to identify the risk factors influencing drinking permissiveness in university students. Method: The participants in this descriptive survey on causal relations were 219 students enrolled in university who were selected by convenience sampling. The data collected from April to July, 2005 were used in multiple regression analysis to build a prediction model. Results: Differences in drinking permissiveness according to general characteristics were as follows: gender, drinking frequency, drinking in more than one place each time and frequency of excessive drinking. The relationship between drinking permissiveness and amount of alcohol consumption (drinking frequency/month, amount/each time) showed positive correlations. The relationship between drinking permissiveness and motives to drink (social, enhancement, confirmity, coping motives) also showed positive correlations. The causal factors of drinking permissiveness were social motives, capacity/each time and drinking frequency/month. Conclusion: The findings suggest that board intervention programs should be provided to prevent problems of excessive drinking. It is also recommended that a program be developed that can help control the variables identified in this study along with follow up study to verify the model.

  • PDF

A Study on Customer Satisfactions toward Hotel Restaurants (호텔레스토랑 이용고객의 메뉴 만족도에 관한 연구)

  • 강성일
    • Culinary science and hospitality research
    • /
    • v.6 no.2
    • /
    • pp.135-155
    • /
    • 2000
  • The main purpose of this study is to investigate the factors affecting customer satisfactions toward the italic restaurants of hotels. Especially, the role of menu-related factors is elaborated. Based on the previous research findings, the following hypotheses were proposed and tested. First, customer evaluations of the factors related to the service of italic hotel restaurants wi11 show differences, depending upon demographics. The results found are as follows. Concerning the seasonality and variety of menu, customer evaluations differed by gender. Depending on age groups, customer evaluations differed for the communicative quality of menu, the restaurant atmosphere, the employee service level, and the food taste. By the type of occupations, there were differences in customer evaluations of the communicative quality of menu, the employee service level, and tie food taste. By the education levels, there were differences in the evaluations toward the seasonality and variety of menu, the restaurant atmosphere, the employee service level, and the food taste, Finally. concerning the restaurant atmosphere and the food taste, customer evaluations differed by their income levels. Second, the employee service level, the seasonality and variety of menu, the communicative quality of menu, the restaurant atmosphere, and the food taste are predicted to significantly affect customer satisfactions, My results were consistent with this prediction except for that the communicative quality of menu did not significantly affect customer satisfactions. Regarding the role of menu-related factors in customer satisfactions, my finding implies the importance of updating the menu, providing the variety and reflecting the seasonality. The more studies, however, should be needed to explore the various roles of menu-related factors in restaurant customer satisfactions.

  • PDF