• Title/Summary/Keyword: machine learning

Search Result 5,285, Processing Time 0.03 seconds

Optimized Feature Selection using Feature Subset IG-MLP Evaluation based Machine Learning Model for Disease Prediction (특징집합 IG-MLP 평가 기반의 최적화된 특징선택 방법을 이용한 질환 예측 머신러닝 모델)

  • Kim, Kyeongryun;Kim, Jaekwon;Lee, Jongsik
    • Journal of the Korea Society for Simulation
    • /
    • v.29 no.1
    • /
    • pp.11-21
    • /
    • 2020
  • Cardio-cerebrovascular diseases (CCD) account for 24% of the causes of death to Koreans and its proportion is the highest except cancer. Currently, the risk of the cardiovascular disease for domestic patients is based on the Framingham risk score (FRS), but accuracy tends to decrease because it is a foreign guideline. Also, it can't score the risk of cerebrovascular disease. CCD is hard to predict, because it is difficult to analyze the features of early symptoms for prevention. Therefore, proper prediction method for Koreans is needed. The purpose of this paper is validating IG-MLP (Information Gain - Multilayer Perceptron) evaluation based feature selection method using CCD data with simulation. The proposed method uses the raw data of the 4th ~ 7th of The Korea National Health and Nutrition Examination Survey (KNHANES). To select the important feature of CCD, analysis on the attributes using IG-MLP are processed, finally CCD prediction ANN model using optimize feature set is provided. Proposed method can find important features of CCD prediction of Koreans, and ANN model could predict more accurate CCD for Koreans.

Implementation and Performance Measuring of Erasure Coding of Distributed File System (분산 파일시스템의 소거 코딩 구현 및 성능 비교)

  • Kim, Cheiyol;Kim, Youngchul;Kim, Dongoh;Kim, Hongyeon;Kim, Youngkyun;Seo, Daewha
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.11
    • /
    • pp.1515-1527
    • /
    • 2016
  • With the growth of big data, machine learning, and cloud computing, the importance of storage that can store large amounts of unstructured data is growing recently. So the commodity hardware based distributed file systems such as MAHA-FS, GlusterFS, and Ceph file system have received a lot of attention because of their scale-out and low-cost property. For the data fault tolerance, most of these file systems uses replication in the beginning. But as storage size is growing to tens or hundreds of petabytes, the low space efficiency of the replication has been considered as a problem. This paper applied erasure coding data fault tolerance policy to MAHA-FS for high space efficiency and introduces VDelta technique to solve data consistency problem. In this paper, we compares the performance of two file systems, MAHA-FS and GlusterFS. They have different IO processing architecture, the former is server centric and the latter is client centric architecture. We found the erasure coding performance of MAHA-FS is better than GlusterFS.

Comparative Usefulness of Naver and Google Search Information in Predictive Models for Youth Unemployment Rate in Korea (한국 청년실업률 예측 모형에서 네이버와 구글 검색 정보의 유용성 분석)

  • Jung, Jae Un
    • Journal of Digital Convergence
    • /
    • v.16 no.8
    • /
    • pp.169-179
    • /
    • 2018
  • Recently, web search query information has been applied in advanced predictive model research. Google dominates the global web search market in the Korean market; however, Naver possesses a dominant market share. Based on this characteristic, this study intends to compare the utility of the Korean web search query information of Google and Naver using predictive models. Therefore, this study develops three time-series predictive models to estimate the youth unemployment rate in Korea using the ARIMA model. Model 1 only used the youth unemployment rate in Korea, whereas Models 2 and 3 added the Korean web search query information of Naver and Google, respectively, to Model 1. Compared to the predictability of the models during the training period, Models 2 and 3 showed better fit compared with Model 1. Models 2 and 3 correlated different query information. During predictive periods 1 (continuous with the training period) and 2 (discontinuous with the training period), Model 3 showed the best performance. During predictive period 2, only Model 3 exhibited a significant prediction result. This comparative study contributes to a general understanding of the usefulness of Korean web query information using the Naver and Google search engines.

Analysis of Hibernating Habitat of Asiatic Black Bear(Ursus thibetanus ussuricus ) based on the Presence-Only Model using MaxEnt and Geographic Information System: A Comparative Study of Habitat for Non-Hibernating Period (MaxEnt와 GIS를 활용한 반달가슴곰 동면장소 분석: 비동면 기간 동안의 서식지 비교 연구)

  • JUNG, Dae-Ho;KAHNG, Byung-Seon;CHO, Chae-Un;KIM, Seok-Beom;KIM, Jeong-Jin
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.19 no.3
    • /
    • pp.102-113
    • /
    • 2016
  • This study analyzes the geographic information system (GIS) and machine learning models to understand the relationship between the appearance of hibernation sites and habitats in order to systematically manage the habitat of Asiatic Black Bear(Ursus thibetanus ussuricus) inhabiting Jirisan National Park, South Korea. The most important environmental factors influencing the hibernation sites was found to be the inclination(41.4%), followed by altitude(20.4%), distance from the trail(10.9%), and age group(7.7%) in the order of their contribution. A comparison between the hibernation habitat and the normal habitat of Asiatic Black Bear indicated that the average altitude of the hibernation sites was 63m, whereas the average altitude of the normal habitat was approximately 400m. The average inclination was found to be $7^{\circ}$, and a preference for the steeper inclination of $12-43^{\circ}$ was also observed. The average distance of the hibernation site from the road was approximately 300m; the range of separation distance was found to be 1,300-2,400m. This was thought to be the result of a safer selection of winter hibernation site by preventing human contact and outside invasion. This study analyzes the habitat environmental factors for the selection of hibernation sites that prevent severe cold and other threats during the hibernation period in order to provide fundamental data for hibernation ecology and habitat management of Asiatic Black Bear.

A Case Study of "Engineering Design" Education with Emphasize on Hands-on Experience (기계공학과에서 제시하는 Hands-on Experience 중심의 "엔지니어링 디자인" 교과목의 강의사례)

  • Kim, Hong-Chan;Kim, Ji-Hoon;Kim, Kwan-Ju;Kim, Jung-Soo
    • Journal of Engineering Education Research
    • /
    • v.10 no.2
    • /
    • pp.44-61
    • /
    • 2007
  • The present investigation is concerned chiefly with new curriculum development at the Department of Mechanical System & Design Engineering at Hongik University with the aim of enhancing creativity, team working and communication capability which modern engineering education is emphasizing on. 'Mechanical System & Design Engineering' department equipped with new curriculum emphasizing engineering design is new name for mechanical engineering department in Hongik University. To meet radically changing environment and demands of industries toward engineering education, the department has shifted its focus from analog-based and machine-centered hard approach to digital-based and human-centered soft approach. Three new programs of Introduction to Mechanical System & Design Engineering, Creative Engineering Design and Product Design emphasize hands-on experiences through project-based team working. Sketch model and prototype making process is strongly emphasized and cardboard, poly styrene foam and foam core plate are provided as working material instead of traditional hard engineering material such as metals material because these three programs focus more on creative idea generation and dynamic communication among team members rather than the end results. With generative, visual and concrete experiences that can compensate existing engineering classes with traditional focus on analytic, mathematical and reasoning, hands-on experiences can play a significant role for engineering students to develop creative thinking and engineering sense needed to face ill-defined real-world design problems they are expected to encounter upon graduation.

Band Selection Using L2,1-norm Regression for Hyperspectral Target Detection (초분광 표적 탐지를 위한 L2,1-norm Regression 기반 밴드 선택 기법)

  • Kim, Joochang;Yang, Yukyung;Kim, Jun-Hyung;Kim, Junmo
    • Korean Journal of Remote Sensing
    • /
    • v.33 no.5_1
    • /
    • pp.455-467
    • /
    • 2017
  • When performing target detection using hyperspectral imagery, a feature extraction process is necessary to solve the problem of redundancy of adjacent spectral bands and the problem of a large amount of calculation due to high dimensional data. This study proposes a new band selection method using the $L_{2,1}$-norm regression model to apply the feature selection technique in the machine learning field to the hyperspectral band selection. In order to analyze the performance of the proposed band selection technique, we collected the hyperspectral imagery and these were used to analyze the performance of target detection with band selection. The Adaptive Cosine Estimator (ACE) detection performance is maintained or improved when the number of bands is reduced from 164 to about 30 to 40 bands in the 350 nm to 2500 nm wavelength band. Experimental results show that the proposed band selection technique extracts bands that are effective for detection in hyperspectral images and can reduce the size of the data without reducing the performance, which can help improve the processing speed of real-time target detection system in the future.

A Study of Statistical Learning as a CRM s Classifier Functions (CRM의 기능 분류를 위한 통계적 학습에 관한 연구)

  • Jang, Geun;Lee, Jung-Bae;Lee, Byung-Soo
    • The KIPS Transactions:PartB
    • /
    • v.11B no.1
    • /
    • pp.71-76
    • /
    • 2004
  • The recent ERP and CRM is mostly focused on the conventional function performances. However, the recent business environment has brought the change in market due to the rapid progress of internet and e-commerce. It is mostly becoming e-business and spreading out as development of the relationship with other cooperating companies, the rapid progress of the relationship with customers, and intensification competitive power through the development of business progress in the organization. CRM(custom relationship management) is a kind of the marketing progress which forms, manages, and intensifies the relationship between the customers and companies to manage the acquired customers and increase the worth of customers for the company. It needs the system base which analyzes the information of customers since it functions on the basis of various information about customers and is linked to the business category such as producing, marketing, and decision making. Since ERP is extending its function to SCM, CRM, and SEM(strategic Enterprise Management), the 21 century s ERP develop as the strategy tool of e-business and, as the mediation for this, will subdivide the functions of CRM effectively by the analogic study of data. Also, to accomplish classification work of the file which in existing becomes accomplished with possibility work with an automatic movement with the user will be able to accomplish a more efficiently work the agent which in order leads the machine studying law, it is one thing with system feature.

Human Walking Detection and Background Noise Classification by Deep Neural Networks for Doppler Radars (사람 걸음 탐지 및 배경잡음 분류 처리를 위한 도플러 레이다용 딥뉴럴네트워크)

  • Kwon, Jihoon;Ha, Seoung-Jae;Kwak, Nojun
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.29 no.7
    • /
    • pp.550-559
    • /
    • 2018
  • The effectiveness of deep neural networks (DNNs) for detection and classification of micro-Doppler signals generated by human walking and background noise sources is investigated. Previous research included a complex process for extracting meaningful features that directly affect classifier performance, and this feature extraction is based on experiences and statistical analysis. However, because a DNN gradually reconstructs and generates features through a process of passing layers in a network, the preprocess for feature extraction is not required. Therefore, binary classifiers and multiclass classifiers were designed and analyzed in which multilayer perceptrons (MLPs) and DNNs were applied, and the effectiveness of DNNs for recognizing micro-Doppler signals was demonstrated. Experimental results showed that, in the case of MLPs, the classification accuracies of the binary classifier and the multiclass classifier were 90.3% and 86.1%, respectively, for the test dataset. In the case of DNNs, the classification accuracies of the binary classifier and the multiclass classifier were 97.3% and 96.1%, respectively, for the test dataset.

The identification of Raman spectra by using linear intensity calibration (선형 강도 교정을 이용한 라만 스펙트럼 인식)

  • Park, Jun-Kyu;Baek, Sung-June;Park, Aaron
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.3
    • /
    • pp.32-39
    • /
    • 2018
  • Raman spectra exhibit differences in intensity depending on the measuring equipment and environmental conditions even for the same material. This restricts the pattern recognition approach of Raman spectroscopy and is an issue that must be solved for the sake of its practical application, so as to enable the reusability of the Raman database and interoperability between Raman devices. To this end, previous studies assumed the existence of a transfer function between the measurement devices to obtain a direct spectral correction. However, this method cannot cope with other conditions that cause various intensity distortions. Therefore, we propose a classification method using linear intensity calibration which can deal with various measurement conditions more flexibly. In order to evaluate the performance of the proposed method, a Raman library containing 14033 chemical substances was used for identification. Ten kinds of chemical Raman spectra measured using three different Raman spectroscopes were used as the experimental data. The experimental results show that the proposed method achieves 100% discrimination performance against the intensity-distorted spectra and shows a high correlation score for the identified material, thus making it a useful tool for the identification of chemical substances.

A Scheme for Identifying Malicious Applications Based on API Characteristics (API 특성 정보기반 악성 애플리케이션 식별 기법)

  • Cho, Taejoo;Kim, Hyunki;Lee, Junghwan;Jung, Moongyu;Yi, Jeong Hyun
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.1
    • /
    • pp.187-196
    • /
    • 2016
  • Android applications are inherently vulnerable to a repackaging attack such that malicious codes are easily inserted into an application and then resigned by the attacker. These days, it occurs often that such private or individual information is leaked. In principle, all Android applications are composed of user defined methods and APIs. As well as accessing to resources on platform, APIs play a role as a practical functional feature, and user defined methods play a role as a feature by using APIs. In this paper we propose a scheme to analyze sensitive APIs mostly used in malicious applications in terms of how malicious applications operate and which API they use. Based on the characteristics of target APIs, we accumulate the knowledge on such APIs using a machine learning scheme based on Naive Bayes algorithm. Resulting from the learned results, we are able to provide fine-grained numeric score on the degree of vulnerabilities of mobile applications. In doing so, we expect the proposed scheme will help mobile application developers identify the security level of applications in advance.