• Title/Summary/Keyword: WEKA

Search Result 57, Processing Time 0.027 seconds

Performance Comparison of Algorithm through Classification of Parkinson's Disease According to the Speech Feature (음성 특징에 따른 파킨슨병 분류를 위한 알고리즘 성능 비교)

  • Chung, Jae Woo
    • Journal of Korea Multimedia Society
    • /
    • v.19 no.2
    • /
    • pp.209-214
    • /
    • 2016
  • The purpose of this study was to classify healty persons and Parkinson disease patients from the vocal characteristics of healty persons and the of Parkinson disease patients using Machine Learning algorithms. So, we compared the most widely used algorithms for Machine Learning such as J48 algorithm and REPTree algorithm. In order to evaluate the classification performance of the two algorithms, the results were compared with depending on vocal characteristics. The classification performance of depending on vocal characteristics show 88.72% and 84.62%. The test results showed that the J48 algorithms was superior to REPTree algorithms.

Analysis of Feature Variables for Breast Cancer Diagnosis

  • Jung, Yong Gyu;Kim, Jang Il;Sihn, Sung Chul;Heo, Jun
    • International journal of advanced smart convergence
    • /
    • v.2 no.2
    • /
    • pp.36-39
    • /
    • 2013
  • It is becoming more important as the growing of health information and increasing in cancer patients diagnose over the time gradually. Among the various types of cancer, we focuses on breast cancer diagnosis. The accuracy of breast cancer diagnosis is increasing when the diagnosis is based on evidence and statistics. To do this we use the weka data mining tools and analysis algorithms significantly associated with the decision tree uses rules. In addition, the data pre-processing and cross-validation are used to increase the reliability of the results. The number and cause of the disease becomes important to increase evidence-based medical doctors. As the evidence-based medical, the data obtained from patients in the past through the disease by calculating the probability for future patients to diagnose and predict disease and treatment plan. It can be found by improving the survival rate plays an important role.

A Linguistic Study of Automatic Speech Act Classification for Korean Dialog (한국어 대화문 화행 자동분류를 위한 언어학적 기반연구)

  • Koo, Youngeun;Kim, Jiyoun;Hong, Munpyo;Kim, Young-Kil
    • Annual Conference on Human and Language Technology
    • /
    • 2017.10a
    • /
    • pp.17-22
    • /
    • 2017
  • 화행이란 의사소통 과정에서 발화자가 가지는 발화 의도를 말한다. 성공적인 의사소통을 위해서는 발화자의 화행을 정확하게 파악하는 것이 매우 중요하다. 본 논문에서는 한국어 대화체 문장의 화행 자동분류를 위해, 화행을 결정짓는 요인이 무엇인지 언어학적으로 분석하고자 하였다. 한국어 수업 대화를 분석하여 화행 분류 체계를 새롭게 자체 정립하였고, 언어학적 근거를 바탕으로 10개의 화행 분류 자질을 제안하였다. 또한 제안하는 화행 분류 자질을 검증하고자 웨카(Weka)를 이용하여 정확률 실험을 진행하였다.

  • PDF

Implementation of Data Preparation System for Data Mining on Heterogenious Distributed Environment (이기종 분산환경에서 데이터마이닝을 위한 데이터준비 시스템 구현)

  • Lee sang hee;Lee won sup
    • Journal of the Korea Society of Computer and Information
    • /
    • v.9 no.3
    • /
    • pp.109-113
    • /
    • 2004
  • This paper is to investigate the efficiency of the process of data preparation for existing data mining tools, and present a design principle for a new efficient data preparation system . We compare the often used data mining tools based on the access method to local and remote databases, and on the exchange of information resources between different computers. The compared data mining tools are Answer Tree, Clementine, Enterprise Miner, and Weka. We propose a design principle for an efficient system for data preparation for data mining on the distributed networks.

  • PDF

A Linguistic Study of Automatic Speech Act Classification for Korean Dialog (한국어 대화문 화행 자동분류를 위한 언어학적 기반연구)

  • Koo, Youngeun;Kim, Jiyoun;Hong, Munpyo;Kim, Young-Kil
    • 한국어정보학회:학술대회논문집
    • /
    • 2017.10a
    • /
    • pp.17-22
    • /
    • 2017
  • 화행이란 의사소통 과정에서 발화자가 가지는 발화 의도를 말한다. 성공적인 의사소통을 위해서는 발화자의 화행을 정확하게 파악하는 것이 매우 중요하다. 본 논문에서는 한국어 대화체 문장의 화행 자동분류를 위해, 화행을 결정짓는 요인이 무엇인지 언어학적으로 분석하고자 하였다. 한국어 수업 대화를 분석하여 화행 분류 체계를 새롭게 자체 정립하였고, 언어학적 근거를 바탕으로 10개의 화행 분류 자질을 제안하였다. 또한 제안하는 화행 분류 자질을 검증하고자 웨카(Weka)를 이용하여 정확률 실험을 진행하였다.

  • PDF

Automatic categorization of chloride migration into concrete modified with CFBC ash

  • Marks, Maria;Jozwiak-Niedzwiedzka, Daria;Glinicki, Michal A.
    • Computers and Concrete
    • /
    • v.9 no.5
    • /
    • pp.375-387
    • /
    • 2012
  • The objective of this investigation was to develop rules for automatic categorization of concrete quality using selected artificial intelligence methods based on machine learning. The range of tested materials included concrete containing a new waste material - solid residue from coal combustion in fluidized bed boilers (CFBC fly ash) used as additive. The rapid chloride permeability test - Nordtest Method BUILD 492 method was used for determining chloride ions penetration in concrete. Performed experimental tests on obtained chloride migration provided data for learning and testing of rules discovered by machine learning techniques. It has been found that machine learning is a tool which can be applied to determine concrete durability. The rules generated by computer programs AQ21 and WEKA using J48 algorithm provided means for adequate categorization of plain concrete and concrete modified with CFBC fly ash as materials of good and acceptable resistance to chloride penetration.

R명령어들의 속도 평가

  • Lee, Jin-A;Heo, Mun-Yeol
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2003.10a
    • /
    • pp.301-305
    • /
    • 2003
  • 최근에 R은 여러 분야에서 많이 사용되고 있다. 특히 모의실험(simulation)이나 통계학 관련 연구에 많이 사용되고 있다. 모의실험을 하는 경우에는 많은 반복으로 인해 R 프로그램의 수행 속도가 매우 중요하다. 또한 데이터마이닝 분야에서도 R을 많이 사용하고 있다. 우리는 데이터 마이닝에서 데이터의 전처리 과정 중 Fayyad & Irani 방법을 사용하여 연속형 변수를 이산화하는 실험을 하였으며, 이를 위해 R을 사용하였다. 이 프로그램은 재귀 함수를 이용하고 이런 과정에서 빈도표 작성, information계산, 빈도표의 분할, 정지 규칙 등의 여러 함수를 사용하게 되어있다. 우리가 작성한 R 로드를 사용하여 UCI DB의 Iono 자료를 (속성이 35개, 사례수가 약 1000개정도) 이산화 하였을 때 7초 이상의 상당한 시간이 소요된다. 반면에 JAVA로 만들어진 Weka에서 똑같은 Fayyad & Irani 방법을 수행했을 때 위와 같은 큰 자료를 이산화하는 속도가 매우 빨라 수행시간은 거의 무시할 만하였다. 이런 차이점을 보고 R 프로그램의 수행 속도를 늘이는 방법을 찾게 되었다. 이 본 발표에서는 R 코드 중 시간이 많이 소요되는 것들을 몇 가지 선정하고 이들을 더 효율적으로 만들 수 있는 코드를 작성하여 이들 코드의 수행속도를 비교하였다. 또한 몇 가지 명령에 대해서는SAS와도 비교하였다.

  • PDF

Big data Analysis using Python in Agriculture Forestry and Fisheries

  • Kim, So hee;Kang, Min Soo;Jung, Yong Gyu
    • International journal of advanced smart convergence
    • /
    • v.5 no.1
    • /
    • pp.47-50
    • /
    • 2016
  • Big Data is coming rapidly in recent times and keep the vast amount of data was utilized them. These data are utilized in many fields in particular, based on the patient data in the medical field to increase the therapeutic effect, as well as re-incidence to better treatment, lowering the readmission rates increased the quality of life. In this paper it is practiced to report basis of the analysis and verification of data using python. And it can be analyzed the data through a simple formula, from Select reason of Python to how it used; by Press analysis of Agriculture, Forestry and Fisheries research. In this process, a simple formula can be used that expression for analyzing the actual data so it taking advantage of the use of functions in real life.

Performance Comparison of Decision Trees of J48 and Reduced-Error Pruning

  • Jin, Hoon;Jung, Yong Gyu
    • International journal of advanced smart convergence
    • /
    • v.5 no.1
    • /
    • pp.30-33
    • /
    • 2016
  • With the advent of big data, data mining is more increasingly utilized in various decision-making fields by extracting hidden and meaningful information from large amounts of data. Even as exponential increase of the request of unrevealing the hidden meaning behind data, it becomes more and more important to decide to select which data mining algorithm and how to use it. There are several mainly used data mining algorithms in biology and clinics highlighted; Logistic regression, Neural networks, Supportvector machine, and variety of statistical techniques. In this paper it is attempted to compare the classification performance of an exemplary algorithm J48 and REPTree of ML algorithms. It is confirmed that more accurate classification algorithm is provided by the performance comparison results. More accurate prediction is possible with the algorithm for the goal of experiment. Based on this, it is expected to be relatively difficult visually detailed classification and distinction.

A Study on Efficient Memory Management Using Machine Learning Algorithm

  • Park, Beom-Joo;Kang, Min-Soo;Lee, Minho;Jung, Yong Gyu
    • International journal of advanced smart convergence
    • /
    • v.6 no.1
    • /
    • pp.39-43
    • /
    • 2017
  • As the industry grows, the amount of data grows exponentially, and data analysis using these serves as a predictable solution. As data size increases and processing speed increases, it has begun to be applied to new fields by combining artificial intelligence technology as well as simple big data analysis. In this paper, we propose a method to quickly apply a machine learning based algorithm through efficient resource allocation. The proposed algorithm allocates memory for each attribute. Learning Distinct of Attribute and allocating the right memory. In order to compare the performance of the proposed algorithm, we compared it with the existing K-means algorithm. As a result of measuring the execution time, the speed was improved.