• Title/Summary/Keyword: 기계학습 알고리즘

Search Result 781, Processing Time 0.023 seconds

Statistical Analysis for Risk Factors and Prediction of Hypertension based on Health Behavior Information (건강행위정보기반 고혈압 위험인자 및 예측을 위한 통계분석)

  • Heo, Byeong Mun;Kim, Sang Yeob;Ryu, Keun Ho
    • Journal of Digital Contents Society
    • /
    • v.19 no.4
    • /
    • pp.685-692
    • /
    • 2018
  • The purpose of this study is to develop a prediction model of hypertension in middle-aged adults using Statistical analysis. Statistical analysis and prediction models were developed using the National Health and Nutrition Survey (2013-2016).Binary logistic regression analysis showed statistically significant risk factors for hypertension, and a predictive model was developed using logistic regression and the Naive Bayes algorithm using Wrapper approach technique. In the statistical analysis, WHtR(p<0.0001, OR = 2.0242) in men and AGE (p<0.0001, OR = 3.9185) in women were the most related factors to hypertension. In the performance evaluation of the prediction model, the logistic regression model showed the best predictive power in men (AUC = 0.782) and women (AUC = 0.858). Our findings provide important information for developing large-scale screening tools for hypertension and can be used as the basis for hypertension research.

Confidence Value based Large Scale OWL Horst Ontology Reasoning (신뢰 값 기반의 대용량 OWL Horst 온톨로지 추론)

  • Lee, Wan-Gon;Park, Hyun-Kyu;Jagvaral, Batselem;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.43 no.5
    • /
    • pp.553-561
    • /
    • 2016
  • Several machine learning techniques are able to automatically populate ontology data from web sources. Also the interest for large scale ontology reasoning is increasing. However, there is a problem leading to the speculative result to imply uncertainties. Hence, there is a need to consider the reliability problems of various data obtained from the web. Currently, large scale ontology reasoning methods based on the trust value is required because the inference-based reliability of quantitative ontology is insufficient. In this study, we proposed a large scale OWL Horst reasoning method based on a confidence value using spark, a distributed in-memory framework. It describes a method for integrating the confidence value of duplicated data. In addition, it explains a distributed parallel heuristic algorithm to solve the problem of degrading the performance of the inference. In order to evaluate the performance of reasoning methods based on the confidence value, the experiment was conducted using LUBM3000. The experiment results showed that our approach could perform reasoning twice faster than existing reasoning systems like WebPIE.

Development of an Automatic Program to Analyze Sunspot Groups for Solar Flare Forecasting (태양 플레어 폭발 예보를 위한 흑점군 자동분석 프로그램 개발)

  • Park, Jongyeob;Moon, Yong-Jae;Choi, SeongHwan;Park, Young-Deuk
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.38 no.2
    • /
    • pp.98-98
    • /
    • 2013
  • 태양의 활동영역에서 관측할 수 있는 흑점은 주로 흑점군으로 관측되며, 태양폭발현상의 발생을 예보하기 위한 중요한 관측 대상 중 하나이다. 현재 태양 폭발을 예보하는 모델들은 McIntosh 흑점군 분류법을 사용하며 통계적 모델과 기계학습 모델로 나누어진다. 컴퓨터는 흑점군의 형태학적 특성을 연속적인 값으로 계산하지만 흑점군의 형태적 다양성으로 인해 McIntosh 분류법과 일치하지 않는 경우가 있다. 이러한 이유로 컴퓨터가 계산한 흑점군의 형태학적인 특성을 예보에 직접 적용하는 것이 필요하다. 우리는 흑점군을 검출하기 위해 최소신장트리(Minimum spanning tree : MST)를 이용한 계층적 군집화 기법을 수행하였다. 그래프(Graph)이론에서 최소신장트리는 정점(Vertex)과 간선(Edge)으로 구성된 간선의 가중치의 합이 최소인 트리이다. 우리는 모든 흑점을 정점, 그들의 연결을 간선으로 적용하여 최소신장트리를 작성하였다. 또한 최소신장트리를 활용한 계층적 군집화기법은 초기값에 따른 군집화 결과의 차이가 없기 때문에 흑점군 검출에 있어서 가장 적합한 알고리즘이다. 이를 통해 흑점군의 기본적인 형태학적인 특성(개수, 면적, 면적비 등)을 계산하고 최소신장트리를 통해 가장 면적이 큰 흑점을 중심으로 트리의 깊이(Depth)와 차수(Degree)를 계산하였다. 이 방법을 2003년 SOHO/MDI의 태양 가시광 영상에 적용하여 구한 흑점군의 내부 흑점수와 면적은 NOAA에서 산출한 값들과 각각 90%, 99%의 좋은 상관관계를 가졌다. 우리는 이 연구를 통해 흑점군의 형태학적인 특성과 더불어 예보에 직접적으로 활용할 수 있는 방법을 논의하고자 한다.

  • PDF

A Decision Support Model for Sustainable Collaboration Level on Supply Chain Management using Support Vector Machines (Support Vector Machines을 이용한 공급사슬관리의 지속적 협업 수준에 대한 의사결정모델)

  • Lim, Se-Hun
    • Journal of Distribution Research
    • /
    • v.10 no.3
    • /
    • pp.1-14
    • /
    • 2005
  • It is important to control performance and a Sustainable Collaboration (SC) for the successful Supply Chain Management (SCM). This research developed a control model which analyzed SCM performances based on a Balanced Scorecard (ESC) and an SC using Support Vector Machine (SVM). 108 specialists of an SCM completed the questionnaires. We analyzed experimental data set using SVM. This research compared the forecasting accuracy of an SCMSC through four types of SVM kernels: (1) linear, (2) polynomial (3) Radial Basis Function (REF), and (4) sigmoid kernel (linear > RBF > Sigmoid > Polynomial). Then, this study compares the prediction performance of SVM linear kernel with Artificial Neural Network. (ANN). The research findings show that using SVM linear kernel to forecast an SCMSC is the most outstanding. Thus SVM linear kernel provides a promising alternative to an SC control level. A company which pursues an SCM can use the information of an SC in the SVM model.

  • PDF

An Automated Topic Specific Web Crawler Calculating Degree of Relevance (연관도를 계산하는 자동화된 주제 기반 웹 수집기)

  • Seo Hae-Sung;Choi Young-Soo;Choi Kyung-Hee;Jung Gi-Hyun;Noh Sang-Uk
    • Journal of Internet Computing and Services
    • /
    • v.7 no.3
    • /
    • pp.155-167
    • /
    • 2006
  • It is desirable if users surfing on the Internet could find Web pages related to their interests as closely as possible. Toward this ends, this paper presents a topic specific Web crawler computing the degree of relevance. collecting a cluster of pages given a specific topic, and refining the preliminary set of related web pages using term frequency/document frequency, entropy, and compiled rules. In the experiments, we tested our topic specific crawler in terms of the accuracy of its classification, crawling efficiency, and crawling consistency. First, the classification accuracy using the set of rules compiled by CN2 was the best, among those of C4.5 and back propagation learning algorithms. Second, we measured the classification efficiency to determine the best threshold value affecting the degree of relevance. In the third experiment, the consistency of our topic specific crawler was measured in terms of the number of the resulting URLs overlapped with different starting URLs. The experimental results imply that our topic specific crawler was fairly consistent, regardless of the starting URLs randomly chosen.

  • PDF

Design and Implementation of OCR Correction Model for Numeric Digits based on a Context Sensitive and Multiple Streams (제한적 문맥 인식과 다중 스트림을 기반으로 한 숫자 정정 OCR 모델의 설계 및 구현)

  • Shin, Hyun-Kyung
    • The KIPS Transactions:PartD
    • /
    • v.18D no.1
    • /
    • pp.67-80
    • /
    • 2011
  • On an automated business document processing system maintaining financial data, errors on query based retrieval of numbers are critical to overall performance and usability of the system. Automatic spelling correction methods have been emerged and have played important role in development of information retrieval system. However scope of the methods was limited to the symbols, for example alphabetic letter strings, which can be reserved in the form of trainable templates or custom dictionary. On the other hand, numbers, a sequence of digits, are not the objects that can be reserved into a dictionary but a pure markov sequence. In this paper we proposed a new OCR model for spelling correction for numbers using the multiple streams and the context based correction on top of probabilistic information retrieval framework. We implemented the proposed error correction model as a sub-module and integrated into an existing automated invoice document processing system. We also presented the comparative test results that indicated significant enhancement of overall precision of the system by our model.

Open Platform for Improvement of e-Health Accessibility (의료정보서비스 접근성 향상을 위한 개방형 플랫폼 구축방안)

  • Lee, Hyun-Jik;Kim, Yoon-Ho
    • Journal of Digital Contents Society
    • /
    • v.18 no.7
    • /
    • pp.1341-1346
    • /
    • 2017
  • In this paper, we designed the open service platform based on integrated type of individual customized service and intelligent information technology with individual's complex attributes and requests. First, the data collection phase is proceed quickly and accurately to repeat extraction, transformation and loading. The generated data from extraction-transformation-loading process module is stored in the distributed data system. The data analysis phase is generated a variety of patterns that used the analysis algorithm in the field. The data processing phase is used distributed parallel processing to improve performance. The data providing should operate independently on device-specific management platform. It provides a type of the Open API.

A Study on Performance of ML Algorithms and Feature Extraction to detect Malware (멀웨어 검출을 위한 기계학습 알고리즘과 특징 추출에 대한 성능연구)

  • Ahn, Tae-Hyun;Park, Jae-Gyun;Kwon, Young-Man
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.1
    • /
    • pp.211-216
    • /
    • 2018
  • In this paper, we studied the way that classify whether unknown PE file is malware or not. In the classification problem of malware detection domain, feature extraction and classifier are important. For that purpose, we studied what the feature is good for classifier and the which classifier is good for the selected feature. So, we try to find the good combination of feature and classifier for detecting malware. For it, we did experiments at two step. In step one, we compared the accuracy of features using Opcode only, Win. API only, the one with both. We founded that the feature, Opcode and Win. API, is better than others. In step two, we compared AUC value of classifiers, Bernoulli Naïve Bayes, K-nearest neighbor, Support Vector Machine and Decision Tree. We founded that Decision Tree is better than others.

Inference of Korean Public Sentiment from Online News (온라인 뉴스에 대한 한국 대중의 감정 예측)

  • Matteson, Andrew Stuart;Choi, Soon-Young;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.7
    • /
    • pp.25-31
    • /
    • 2018
  • Online news has replaced the traditional newspaper and has brought about a profound transformation in the way we access and share information. News websites have had the ability for users to post comments for quite some time, and some have also begun to crowdsource reactions to news articles. The field of sentiment analysis seeks to computationally model the emotions and reactions experienced when presented with text. In this work, we analyze more than 100,000 news articles over ten categories with five user-generated emotional annotations to determine whether or not these reactions have a mathematical correlation to the news body text and propose a simple sentiment analysis algorithm that requires minimal preprocessing and no machine learning. We show that it is effective even for a morphologically complex language like Korean.

An Experimental Evaluation of Short Opinion Document Classification Using A Word Pattern Frequency (단어패턴 빈도를 이용한 단문 오피니언 문서 분류기법의 실험적 평가)

  • Chang, Jae-Young;Kim, Ilmin
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.12 no.5
    • /
    • pp.243-253
    • /
    • 2012
  • An opinion mining technique which was developed from document classification in area of data mining now becomes a common interest in domestic as well as international industries. The core of opinion mining is to decide precisely whether an opinion document is a positive or negative one. Although many related approaches have been previously proposed, a classification accuracy was not satisfiable enough to applying them in practical applications. A opinion documents written in Korean are not easy to determine a polarity automatically because they often include various and ungrammatical words in expressing subjective opinions. Proposed in this paper is a new approach of classification of opinion documents, which considers only a frequency of word patterns and excludes the grammatical factors as much as possible. In proposed method, we express a document into a bag of words and then apply a learning algorithm using a frequency of word patterns, and finally decide the polarity of the document using a score function. Additionally, we also present the experiment results for evaluating the accuracy of the proposed method.