• Title/Summary/Keyword: Mining techniques

Search Result 1,104, Processing Time 0.024 seconds

Discovering Relationships between Skin Type and Life Style Using Data Mining Techniques: A Case Study of Korea

  • Kim, Taeheung;Ha, Jihyun;Lee, Jong-Seok;Oh, Younhak;Cho, Yong Ju
    • Industrial Engineering and Management Systems
    • /
    • v.15 no.1
    • /
    • pp.110-121
    • /
    • 2016
  • With the growing interest in skincare and maintenance, there are increasing numbers of studies on the classification of skin type and the factors influencing each type. This study presents a novel methodology by using data mining, for the determination of the relationships between skin type, lifestyle, and patterns of cosmetic utilization. Eight skin-specific factors, which are moisture, sebum in U-zone (both cheeks), sebum in T-zone (forehead, nose, and chin), pore, melanin, wrinkle, acne, hemoglobin, were measured in 1,246 subjects living in South Korea, in conjunction with a questionnaire survey analyzing their lifestyles and pattern of cosmetic utilization. Using various multivariate statistical methods and data mining techniques, we classified the skin types based on the skin-specific values, determined the relationship between skin type and lifestyle, and accordingly sorted the subjects into clusters. Logistic regression analysis revealed gender-related differences in the skin; therefore, separate analyses were performed for males and females. Using the Gaussian Mixture Modeling (GMM) technique, we classified the subjects based on skin type (two male and four female). Using the ANOVA and decision tree techniques, we attempted to characterize the relationship between each skin type and the lifestyles of the subjects. Menstruation, eating habits, stress, and smoking were identified as the major factors affecting the skin.

A Comparative Study of Estimation by Analogy using Data Mining Techniques

  • Nagpal, Geeta;Uddin, Moin;Kaur, Arvinder
    • Journal of Information Processing Systems
    • /
    • v.8 no.4
    • /
    • pp.621-652
    • /
    • 2012
  • Software Estimations provide an inclusive set of directives for software project developers, project managers, and the management in order to produce more realistic estimates based on deficient, uncertain, and noisy data. A range of estimation models are being explored in the industry, as well as in academia, for research purposes but choosing the best model is quite intricate. Estimation by Analogy (EbA) is a form of case based reasoning, which uses fuzzy logic, grey system theory or machine-learning techniques, etc. for optimization. This research compares the estimation accuracy of some conventional data mining models with a hybrid model. Different data mining models are under consideration, including linear regression models like the ordinary least square and ridge regression, and nonlinear models like neural networks, support vector machines, and multivariate adaptive regression splines, etc. A precise and comprehensible predictive model based on the integration of GRA and regression has been introduced and compared. Empirical results have shown that regression when used with GRA gives outstanding results; indicating that the methodology has great potential and can be used as a candidate approach for software effort estimation.

Performance Analysis of Perturbation-based Privacy Preserving Techniques: An Experimental Perspective

  • Ritu Ratra;Preeti Gulia;Nasib Singh Gill
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.10
    • /
    • pp.81-88
    • /
    • 2023
  • In the present scenario, enormous amounts of data are produced every second. These data also contain private information from sources including media platforms, the banking sector, finance, healthcare, and criminal histories. Data mining is a method for looking through and analyzing massive volumes of data to find usable information. Preserving personal data during data mining has become difficult, thus privacy-preserving data mining (PPDM) is used to do so. Data perturbation is one of the several tactics used by the PPDM data privacy protection mechanism. In Perturbation, datasets are perturbed in order to preserve personal information. Both data accuracy and data privacy are addressed by it. This paper will explore and compare several perturbation strategies that may be used to protect data privacy. For this experiment, two perturbation techniques based on random projection and principal component analysis were used. These techniques include Improved Random Projection Perturbation (IRPP) and Enhanced Principal Component Analysis based Technique (EPCAT). The Naive Bayes classification algorithm is used for data mining approaches. These methods are employed to assess the precision, run time, and accuracy of the experimental results. The best perturbation method in the Nave-Bayes classification is determined to be a random projection-based technique (IRPP) for both the cardiovascular and hypothyroid datasets.

Analysis of Prevention Methods by Type of Construction Disaster Using Text Mining Techniques (텍스트마이닝을 활용한 건설현장 재해 유형별 예방 대책 분석)

  • Gyu Pil Jo;Myungdo Lee;Yoon-seok Shin;Baek-Joong Kim
    • Journal of the Society of Disaster Information
    • /
    • v.20 no.1
    • /
    • pp.13-19
    • /
    • 2024
  • Purpose: This study provides prevention methods by type of construction disaster using text mining techniques. Method: Based on the database that analyzed the cases of critical disasters in the domestic construction sector, preventive measures and causes are analyzed by text mining techniques, and the contents of the analysis are visually shown. Result: This visual data represents the measures for preventing critical disasters of each process according to the importance. Conclusion: It is believed that the results will be helpful in identifying factors to be considered in preparing preventive measures for serious accidents in construction.

Students' Performance Prediction in Higher Education Using Multi-Agent Framework Based Distributed Data Mining Approach: A Review

  • M.Nazir;A.Noraziah;M.Rahmah
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.10
    • /
    • pp.135-146
    • /
    • 2023
  • An effective educational program warrants the inclusion of an innovative construction which enhances the higher education efficacy in such a way that accelerates the achievement of desired results and reduces the risk of failures. Educational Decision Support System (EDSS) has currently been a hot topic in educational systems, facilitating the pupil result monitoring and evaluation to be performed during their development. Insufficient information systems encounter trouble and hurdles in making the sufficient advantage from EDSS owing to the deficit of accuracy, incorrect analysis study of the characteristic, and inadequate database. DMTs (Data Mining Techniques) provide helpful tools in finding the models or forms of data and are extremely useful in the decision-making process. Several researchers have participated in the research involving distributed data mining with multi-agent technology. The rapid growth of network technology and IT use has led to the widespread use of distributed databases. This article explains the available data mining technology and the distributed data mining system framework. Distributed Data Mining approach is utilized for this work so that a classifier capable of predicting the success of students in the economic domain can be constructed. This research also discusses the Intelligent Knowledge Base Distributed Data Mining framework to assess the performance of the students through a mid-term exam and final-term exam employing Multi-agent system-based educational mining techniques. Using single and ensemble-based classifiers, this study intends to investigate the factors that influence student performance in higher education and construct a classification model that can predict academic achievement. We also discussed the importance of multi-agent systems and comparative machine learning approaches in EDSS development.

A Date Mining Approach to Intelligent College Road Map Advice Service (데이터 마이닝을 이용한 지능형 전공지도시스템 연구)

  • Choe, Deok-Won;Jo, Gyeong-Pil;Sin, Jin-Gyu
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2005.05a
    • /
    • pp.266-273
    • /
    • 2005
  • Data mining techniques enable us to generate useful information for decision support from the data sources which are generated and accumulated in the process of routine organizational management activities. College administration system is a typical example that produces a warehouse of student records as each and every student enters a college and undertakes the curricular and extracurricular activities. So far, these data have been utilized to a very limited student service purposes, such as issuance of transcripts, graduation evaluation, GPA calculation, etc. In this paper, we utilize Holland career search test results, TOEIC score, course work list, and GPA score as the input for data mining and generation the student advisory information. Factor analysis, AHP(Analytic Hierarchy Process), artificial neural net, and CART(Classification And Regression Tree) techniques are deployed in the data mining process. Since these data mining techniques are very powerful in processing and discovering useful knowledge and information from large scale student databases, we can expect a highly sophisticated student advisory knowledge and services which may not be obtained with the human student advice experts.

  • PDF

Expansion of Opinion Mining based on Entity Association Network Model (개체연관망 모델에 의한 오피니언마이닝의 확장)

  • Kim, Keun-Hyung
    • The KIPS Transactions:PartD
    • /
    • v.18D no.4
    • /
    • pp.237-244
    • /
    • 2011
  • Opinion Mining summarizes with classifying sensitive opinions of customers in huge online customer reviews for the attributes of products or services by positive and negative opinions. Because the customers represent their interests through subjective opinions as well as objective facts, the existing opinion mining techniques, which can analyze just the sensitive opinions, need to be expanded.. In this paper, We propose the novel entity association network model which expands the existing opinion mining techniques. The entity association model can not only represent positive and negative degree of the sensitive opinions, but also can represent the degree of the associations and relative importances between entities. We designed and implemented the customer reviews analysis system based on the entity association network model. We recognized that the system can represent more abundant information than the existing opinion mining techniques.

Fake News Detection for Korean News Using Text Mining and Machine Learning Techniques (텍스트 마이닝과 기계 학습을 이용한 국내 가짜뉴스 예측)

  • Yun, Tae-Uk;Ahn, Hyunchul
    • Journal of Information Technology Applications and Management
    • /
    • v.25 no.1
    • /
    • pp.19-32
    • /
    • 2018
  • Fake news is defined as the news articles that are intentionally and verifiably false, and could mislead readers. Spread of fake news may provoke anxiety, chaos, fear, or irrational decisions of the public. Thus, detecting fake news and preventing its spread has become very important issue in our society. However, due to the huge amount of fake news produced every day, it is almost impossible to identify it by a human. Under this context, researchers have tried to develop automated fake news detection method using Artificial Intelligence techniques over the past years. But, unfortunately, there have been no prior studies proposed an automated fake news detection method for Korean news. In this study, we aim to detect Korean fake news using text mining and machine learning techniques. Our proposed method consists of two steps. In the first step, the news contents to be analyzed is convert to quantified values using various text mining techniques (Topic Modeling, TF-IDF, and so on). After that, in step 2, classifiers are trained using the values produced in step 1. As the classifiers, machine learning techniques such as multiple discriminant analysis, case based reasoning, artificial neural networks, and support vector machine can be applied. To validate the effectiveness of the proposed method, we collected 200 Korean news from Seoul National University's FactCheck (http://factcheck.snu.ac.kr). which provides with detailed analysis reports from about 20 media outlets and links to source documents for each case. Using this dataset, we will identify which text features are important as well as which classifiers are effective in detecting Korean fake news.

Development of Automatic Rule Extraction Method in Data Mining : An Approach based on Hierarchical Clustering Algorithm and Rough Set Theory (데이터마이닝의 자동 데이터 규칙 추출 방법론 개발 : 계층적 클러스터링 알고리듬과 러프 셋 이론을 중심으로)

  • Oh, Seung-Joon;Park, Chan-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.6
    • /
    • pp.135-142
    • /
    • 2009
  • Data mining is an emerging area of computational intelligence that offers new theories, techniques, and tools for analysis of large data sets. The major techniques used in data mining are mining association rules, classification and clustering. Since these techniques are used individually, it is necessary to develop the methodology for rule extraction using a process of integrating these techniques. Rule extraction techniques assist humans in analyzing of large data sets and to turn the meaningful information contained in the data sets into successful decision making. This paper proposes an autonomous method of rule extraction using clustering and rough set theory. The experiments are carried out on data sets of UCI KDD archive and present decision rules from the proposed method. These rules can be successfully used for making decisions.

A Comparison of Capabilities of Data Mining Tools

  • Choi, Youn-Seok;Kim, Jong-Geoun;Lee, Jong-Hee
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.2
    • /
    • pp.531-541
    • /
    • 2001
  • In this study, we compare the capabilities of the data mining tools of the most updated version objectively and provide the useful information in which enterprises and universities chose them. In particular, we compare the SAS/Enterprise Miner 3.0, SPSS/Clementine 5.2 and IBM/Intelligent Miner 6.1 which are well known and easily gotten.

  • PDF