• Title/Summary/Keyword: Feature selection

Search Result 1,076, Processing Time 0.029 seconds

A Study of Unified Framework with Light Weight Artificial Intelligence Hardware for Broad range of Applications (다중 애플리케이션 처리를 위한 경량 인공지능 하드웨어 기반 통합 프레임워크 연구)

  • Jeon, Seok-Hun;Lee, Jae-Hack;Han, Ji-Su;Kim, Byung-Soo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.14 no.5
    • /
    • pp.969-976
    • /
    • 2019
  • A lightweight artificial intelligence hardware has made great strides in many application areas. In general, a lightweight artificial intelligence system consist of lightweight artificial intelligence engine and preprocessor including feature selection, generation, extraction, and normalization. In order to achieve optimal performance in broad range of applications, lightweight artificial intelligence system needs to choose a good preprocessing function and set their respective hyper-parameters. This paper proposes a unified framework for a lightweight artificial intelligence system and utilization method for finding models with optimal performance to use on a given dataset. The proposed unified framework can easily generate a model combined with preprocessing functions and lightweight artificial intelligence engine. In performance evaluation using handwritten image dataset and fall detection dataset measured with inertial sensor, the proposed unified framework showed building optimal artificial intelligence models with over 90% test accuracy.

An Exam Prep App for the Secondary English Teacher Recruitment Exam with Brain-based Memory and Learning Principles (뇌 기억-학습 원리를 적용한 중등영어교사 임용시험 준비용 어플)

  • Lee, Hye-Jin
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.1
    • /
    • pp.311-320
    • /
    • 2021
  • At present, the secondary school teacher employment examination(SSTEE) is the only gateway to become a national and public secondary teacher in Korea, and after the revision from the 2014 academic year, all the questions of the exam have been converted to supply-type test items, requiring more definitive, accurate, and solid answers. Compared to the selection-type test items that measure recognition memory, the supply-type questions, testing recall memory, require constant memorization and retrieval practices to furnish answers; however, there is not enough learning tools available to support the practices. At this juncture, this study invented a mobile app, called ONE PASS, for the SSTEE. By unpacking the functional mechanisms of the brain, the basis of cognitive processing, this ONE PASS app offers a set of tools that feature brain-based learning principles, such as a personalized study planner, motivation measurement scales, mind mapping, brainstorming, and sample questions from previous tests. This study is expected to contribute to the research on the development of learning contents for applications, and at the same time, it hopes to be of some help for candidates in their exam preparation process.

Linear interpolation and Machine Learning Methods for Gas Leakage Prediction Base on Multi-source Data Integration (다중소스 데이터 융합 기반의 가스 누출 예측을 위한 선형 보간 및 머신러닝 기법)

  • Dashdondov, Khongorzul;Jo, Kyuri;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.3
    • /
    • pp.33-41
    • /
    • 2022
  • In this article, we proposed to predict natural gas (NG) leakage levels through feature selection based on a factor analysis (FA) of the integrating the Korean Meteorological Agency data and natural gas leakage data for considering complex factors. The paper has been divided into three modules. First, we filled missing data based on the linear interpolation method on the integrated data set, and selected essential features using FA with OrdinalEncoder (OE)-based normalization. The dataset is labeled by K-means clustering. The final module uses four algorithms, K-nearest neighbors (KNN), decision tree (DT), random forest (RF), Naive Bayes (NB), to predict gas leakage levels. The proposed method is evaluated by the accuracy, area under the ROC curve (AUC), and mean standard error (MSE). The test results indicate that the OrdinalEncoder-Factor analysis (OE-F)-based classification method has improved successfully. Moreover, OE-F-based KNN (OE-F-KNN) showed the best performance by giving 95.20% accuracy, an AUC of 96.13%, and an MSE of 0.031.

A Study on Improving Usability of Webdewey for Learners (학습자를 위한 웹듀이의 사용성 증진 방안 연구)

  • Baek, Ji-won
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.33 no.2
    • /
    • pp.75-95
    • /
    • 2022
  • This study was carried out with the aim of analyzing the development and functional changes of Webdewey, which has become a basic tool of classification learning, analyzing it in terms of usability for learners, and suggesting specific ways to improve WebDewey's usability. In order to achieve this research objective, the concepts and principles of UI and usability were first laid out, and Webdewey's structure and key functions were analyzed. Since then, Webdewey's media changes and periodical feature changes have been analyzed. In addition, an opinion survey was conducted on the usability of WebDewey among learners who used WebDewey in the learning process, and proposed ways to improve WebDewey's usability based on the implications and direction of improvement derived from it. In terms of UI, proposals have been made to introduce display methods, visualization devices, the advantages of printed versions, and the development of Korean versions. In terms of the 'Create built number' function, suggestions have been made to improve usability in terms of basic number selection, composite route guidance and error message provision, new reference and route construction, screen and button design, and built-number component guidance.

Re-evaluation of Obesity Syndrome Differentiation Questionnaire Based on Real-world Survey Data Using Data Mining (데이터 마이닝을 이용한 한의비만변증 설문지 재평가: 실제 임상에서 수집한 설문응답 기반으로)

  • Oh, Jihong;Wang, Jing-Hua;Choi, Sun-Mi;Kim, Hojun
    • Journal of Korean Medicine for Obesity Research
    • /
    • v.21 no.2
    • /
    • pp.80-94
    • /
    • 2021
  • Objectives: The purpose of this study is to re-evaluate the importance of questions of obesity syndrome differentiation (OSD) questionnaire based on real-world survey and to explore the possibility of simplifying OSD types. Methods: The OSD frequency was identified, and variance threshold feature selection was performed to filter the questions. Filtered questions were clustered by K-means clustering and hierarchical clustering. After principal component analysis (PCA), the distribution patterns of the subjects were identified and the differences in the syndrome distribution were compared. Results: The frequency of OSD in spleen deficiency, phlegm (PH), and blood stasis (BS) was lower than in food retention (FR), liver qi stagnation (LS), and yang deficiency. We excluded 13 questions with low variance, 7 of which were related to BS. Filtered questions were clustered into 3 groups by K-means clustering; Cluster 1 (17 questions) mainly related to PH, BS syndromes; Cluster 2 (11 questions) related to swelling, and indigestion; Cluster 3 (11 questions) related to overeating or emotional symptoms. After PCA, significant different patterns of subjects were observed in the FR, LS, and other obesity syndromes. The questions that mainly affect the FR distribution were digestive symptoms. And emotional symptoms mainly affect the distribution of LS subjects. And other obesity syndrome was partially affected by both digestive and emotional symptoms, and also affected by symptoms related to poor circulation. Conclusions: In-depth data mining analysis identified relatively low importance questions and the potential to simplify OSD types.

Analysis of Hypertension Risk Factors by Life Cycle Based on Machine Learning (머신러닝 기반 생애주기별 고혈압 위험 요인 분석)

  • Kang, SeongAn;Kim, SoHui;Ryu, Min Ho
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.27 no.5
    • /
    • pp.73-82
    • /
    • 2022
  • Chronic diseases such as hypertension require a differentiated approach according to age and life cycle. Chronic diseases such as hypertension require differentiated management according to the life cycle. It is also known that the cause of hypertension is a combination of various factors. This study uses machine learning prediction techniques to analyze various factors affecting hypertension by life cycle. To this end, a total of 35 variables were used through preprocessing and variable selection processes for the National Health and Nutrition Survey data of the Korea Centers for Disease Control and Prevention. As a result of the study, among the tree-based machine learning models, XGBoost was found to have high predictive performance in both middle and old age. Looking at the risk factors for hypertension by life cycle, individual characteristic factors, genetic factors, and nutritional intake factors were found to be risk factors for hypertension in the middle age, and nutritional intake factors, dietary factors, and lifestyle factors were derived as risk factors for hypertension. The results of this study are expected to be used as basic data useful for hypertension management by life cycle.

A Study on MRI Semi-Automatically Selected Biomarkers for Predicting Risk of Rectal Cancer Surgery Based on Radiomics (라디오믹스 기반 직장암 수술 위험도 예측을 위한 MRI 반자동 선택 바이오마커 검증 연구)

  • Young Seo, Baik;Young Jae, Kim;Youngbae, Jeon;Tae-sik, Hwang;Jeong-Heum, Baek;Kwang Gi, Kim
    • Journal of Biomedical Engineering Research
    • /
    • v.44 no.1
    • /
    • pp.11-18
    • /
    • 2023
  • Currently, studies to predict the risk of rectal cancer surgery select MRI image slices based on the clinical experience of surgeons. The purpose of this study is to semi-automatically select and classify 2D MRI image slides to predict the risk of rectal cancer surgery using biomarkers. The data used were retrospectively collected MRI imaging data of 50 patients who underwent laparoscopic surgery for rectal cancer at Gachon University Gil Medical Center. Expert-selected MRI image slices and non-selected slices were screened and radiomics was used to extract a total of 102 features. A total of 16 approaches were used, combining 4 classifiers and 4 feature selection methods. The combination of Random Forest and Ridge performed with a sensitivity of 0.83, a specificity of 0.88, an accuracy of 0.85, and an AUC of 0.89±0.09. Differences between expert-selected MRI image slices and non-selected slices were analyzed by extracting the top five significant features. Selected quantitative features help expedite decision making and improve efficiency in studies to predict risk of rectal cancer surgery.

A Detecting Technique for the Climatic Factors that Aided the Spread of COVID-19 using Deep and Machine Learning Algorithms

  • Al-Sharari, Waad;Mahmood, Mahmood A.;Abd El-Aziz, A.A.;Azim, Nesrine A.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.6
    • /
    • pp.131-138
    • /
    • 2022
  • Novel Coronavirus (COVID-19) is viewed as one of the main general wellbeing theaters on the worldwide level all over the planet. Because of the abrupt idea of the flare-up and the irresistible force of the infection, it causes individuals tension, melancholy, and other pressure responses. The avoidance and control of the novel Covid pneumonia have moved into an imperative stage. It is fundamental to early foresee and figure of infection episode during this troublesome opportunity to control of its grimness and mortality. The entire world is investing unimaginable amounts of energy to fight against the spread of this lethal infection. In this paper, we utilized machine learning and deep learning techniques for analyzing what is going on utilizing countries shared information and for detecting the climate factors that effect on spreading Covid-19, such as humidity, sunny hours, temperature and wind speed for understanding its regular dramatic way of behaving alongside the forecast of future reachability of the COVID-2019 around the world. We utilized data collected and produced by Kaggle and the Johns Hopkins Center for Systems Science. The dataset has 25 attributes and 9566 objects. Our Experiment consists of two phases. In phase one, we preprocessed dataset for DL model and features were decreased to four features humidity, sunny hours, temperature and wind speed by utilized the Pearson Correlation Coefficient technique (correlation attributes feature selection). In phase two, we utilized the traditional famous six machine learning techniques for numerical datasets, and Dense Net deep learning model to predict and detect the climatic factor that aide to disease outbreak. We validated the model by using confusion matrix (CM) and measured the performance by four different metrics: accuracy, f-measure, recall, and precision.

Machine Learning-Based Malicious URL Detection Technique (머신러닝 기반 악성 URL 탐지 기법)

  • Han, Chae-rim;Yun, Su-hyun;Han, Myeong-jin;Lee, Il-Gu
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.3
    • /
    • pp.555-564
    • /
    • 2022
  • Recently, cyberattacks are using hacking techniques utilizing intelligent and advanced malicious codes for non-face-to-face environments such as telecommuting, telemedicine, and automatic industrial facilities, and the damage is increasing. Traditional information protection systems, such as anti-virus, are a method of detecting known malicious URLs based on signature patterns, so unknown malicious URLs cannot be detected. In addition, the conventional static analysis-based malicious URL detection method is vulnerable to dynamic loading and cryptographic attacks. This study proposes a technique for efficiently detecting malicious URLs by dynamically learning malicious URL data. In the proposed detection technique, malicious codes are classified using machine learning-based feature selection algorithms, and the accuracy is improved by removing obfuscation elements after preprocessing using Weighted Euclidean Distance(WED). According to the experimental results, the proposed machine learning-based malicious URL detection technique shows an accuracy of 89.17%, which is improved by 2.82% compared to the conventional method.

A Box Office Type Classification and Prediction Model Based on Automated Machine Learning for Maximizing the Commercial Success of the Korean Film Industry (한국 영화의 산업의 흥행 극대화를 위한 AutoML 기반의 박스오피스 유형 분류 및 예측 모델)

  • Subeen Leem;Jihoon Moon;Seungmin Rho
    • Journal of Platform Technology
    • /
    • v.11 no.3
    • /
    • pp.45-55
    • /
    • 2023
  • This paper presents a model that supports decision-makers in the Korean film industry to maximize the success of online movies. To achieve this, we collected historical box office movies and clustered them into types to propose a model predicting each type's online box office performance. We considered various features to identify factors contributing to movie success and reduced feature dimensionality for computational efficiency. We systematically classified the movies into types and predicted each type's online box office performance while analyzing the contributing factors. We used automated machine learning (AutoML) techniques to automatically propose and select machine learning algorithms optimized for the problem, allowing for easy experimentation and selection of multiple algorithms. This approach is expected to provide a foundation for informed decision-making and contribute to better performance in the film industry.

  • PDF