• Title/Summary/Keyword: 결정트리 학습 알고리즘

Search Result 72, Processing Time 0.032 seconds

Design of the student Career prediction program using the decision tree algorithm (의사결정트리 알고리즘을 이용한 학생진로 예측 프로그램의 설계)

  • Kim, Geun-Ho;Jeong, Chong-In;Kim, Chang-Seok;Kang, Shin-Chun;Kim, Eui-Jeong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.332-335
    • /
    • 2018
  • In recent years, artificial intelligence using big data has become a big issue in IT. Various studies are being conducted on services or technologies to effectively handle big data. The educational field, there is big data about students, but it is only a simple process to collect, lookup and store such data. In the future, it makes extensive use of artificial intelligence, machine learning, and statistical analysis to find meaningful rules, patterns, and relationships in the big data of the educational field, and to produce intelligent and useful data for the actual students. Accordingly, this study aims to design a program to predict the career of students using a decision tree algorithm based on the data from the student's classroom observations. Through a career prediction program, it is believed to be helpful to present application paths to students ' counseling and to also provide classroom behavior and direction based on the desired courses.

  • PDF

P2P Traffic Classification using Advanced Heuristic Rules and Analysis of Decision Tree Algorithms (개선된 휴리스틱 규칙 및 의사 결정 트리 분석을 이용한 P2P 트래픽 분류 기법)

  • Ye, Wujian;Cho, Kyungsan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.3
    • /
    • pp.45-54
    • /
    • 2014
  • In this paper, an improved two-step P2P traffic classification scheme is proposed to overcome the limitations of the existing methods. The first step is a signature-based classifier at the packet-level. The second step consists of pattern heuristic rules and a statistics-based classifier at the flow-level. With pattern heuristic rules, the accuracy can be improved and the amount of traffic to be classified by statistics-based classifier can be reduced. Based on the analysis of different decision tree algorithms, the statistics-based classifier is implemented with REPTree. In addition, the ensemble algorithm is used to improve the performance of statistics-based classifier Through the verification with the real datasets, it is shown that our hybrid scheme provides higher accuracy and lower overhead compared to other existing schemes.

Estimation of the steps of cardiovascular disease by machine learning based on aptamers-based biochip data (기계학습에 의한 압타머칩 데이터 기반 심혈관 질환 단계의 예측)

  • Kim Byoung-Hee;Kim Sung-Chun;Zhang Byoung-Tak
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06a
    • /
    • pp.85-87
    • /
    • 2006
  • 압타머칩은 (주)제노프라에서 개발한 새로운 개념의 바이오칩으로서, 압타머(aptamer)를 이용하여 혈액중의 특정 단백질군의 상대적인 양의 변화를 측정할 수 있으며, 질병 진단에 바로 응용할 수 있는 도구이다. 본 논문에서는 압타머칩 데이터 분석을 통해 심혈관 질환 환자의 질병 진행 단계를 예측할 수 있음을 보인다. 정상, 안정/불안정성 협심증, 심근경색의 네 단계로 표지된 환자의 혈액 샘플로부터 제작한 (주)제노프라의 3K 압타머칩 데이터를, 일반 DNA 마이크로어레이 분석과 동일한 과정을 거쳐 분류한 결과, 각 단계별 환자샘플이 확연히 구분되는 것을 확인하였다. 분산분석 결과 P-Value를 이용하여 자질 선택을 수행하고, 분류 알고리즘으로는 신경망, 결정트리, SVM, 베이지안망을 적용한 결과. 각 알고리즘별로 50대 남성환자 31개의 샘플에 대하여 $77{\sim}100%$의 정확도로 심혈관 질환의 단계를 구분해내었다.

  • PDF

A Spam Mail Classification Using Link Structure Analysis (링크구조분석을 이용한 스팸메일 분류)

  • Rhee, Shin-Young;Khil, A-Ra;Kim, Myung-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.1
    • /
    • pp.30-39
    • /
    • 2007
  • The existing content-based spam mail filtering algorithms have difficulties in filtering spam mails when e-mails contain images but little text. In this thesis we propose an efficient spam mail classification algorithm that utilizes the link structure of e-mails. We compute the number of hyperlinks in an e-mail and the in-link frequencies of the web pages hyperlinked in the e-mail. Using these two features we classify spam mails and legitimate mails based on the decision tree trained for spam mail classification. We also suggest a hybrid system combining three different algorithms by majority voting: the link structure analysis algorithm, a modified link structure analysis algorithm, in which only the host part of the hyperlinked pages of an e-mail is used for link structure analysis, and the content-based method using SVM (support vector machines). The experimental results show that the link structure analysis algorithm slightly outperforms the existing content-based method with the accuracy of 94.8%. Moreover, the hybrid system achieves the accuracy of 97.0%, which is a significant performance improvement over the existing method.

Phonetic Question Set Generation Algorithm (음소 질의어 집합 생성 알고리즘)

  • 김성아;육동석;권오일
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.2
    • /
    • pp.173-179
    • /
    • 2004
  • Due to the insufficiency of training data in large vocabulary continuous speech recognition, similar context dependent phones can be clustered by decision trees to share the data. When the decision trees are built and used to predict unseen triphones, a phonetic question set is required. The phonetic question set, which contains categories of the phones with similar co-articulation effects, is usually generated by phonetic or linguistic experts. This knowledge-based approach for generating phonetic question set, however, may reduce the homogeneity of the clusters. Moreover, the experts must adjust the question sets whenever the language or the PLU (phone-like unit) of a recognition system is changed. Therefore, we propose a data-driven method to automatically generate phonetic question set. Since the proposed method generates the phone categories using speech data distribution, it is not dependent on the language or the PLU, and may enhance the homogeneity of the clusters. In large vocabulary speech recognition experiments, the proposed algorithm has been found to reduce the error rate by 14.3%.

Document Summarization using Topic Phrase Extraction and Query-based Summarization (주제어구 추출과 질의어 기반 요약을 이용한 문서 요약)

  • 한광록;오삼권;임기욱
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.488-497
    • /
    • 2004
  • This paper describes the hybrid document summarization using the indicative summarization and the query-based summarization. The learning models are built from teaming documents in order to extract topic phrases. We use Naive Bayesian, Decision Tree and Supported Vector Machine as the machine learning algorithm. The system extracts topic phrases automatically from new document based on these models and outputs the summary of the document using query-based summarization which considers the extracted topic phrases as queries and calculates the locality-based similarity of each topic phrase. We examine how the topic phrases affect the summarization and how many phrases are proper to summarization. Then, we evaluate the extracted summary by comparing with manual summary, and we also compare our summarization system with summarization mettled from MS-Word.

Nakdong River Estuary Salinity Prediction Using Machine Learning Methods (머신러닝 기법을 활용한 낙동강 하구 염분농도 예측)

  • Lee, Hojun;Jo, Mingyu;Chun, Sejin;Han, Jungkyu
    • Smart Media Journal
    • /
    • v.11 no.2
    • /
    • pp.31-38
    • /
    • 2022
  • Promptly predicting changes in the salinity in rivers is an important task to predict the damage to agriculture and ecosystems caused by salinity infiltration and to establish disaster prevention measures. Because machine learning(ML) methods show much less computation cost than physics-based hydraulic models, they can predict the river salinity in a relatively short time. Due to shorter training time, ML methods have been studied as a complementary technique to physics-based hydraulic model. Many studies on salinity prediction based on machine learning have been studied actively around the world, but there are few studies in South Korea. With a massive number of datasets available publicly, we evaluated the performance of various kinds of machine learning techniques that predict the salinity of the Nakdong River Estuary Basin. As a result, LightGBM algorithm shows average 0.37 in RMSE as prediction performance and 2-20 times faster learning speed than other algorithms. This indicates that machine learning techniques can be applied to predict the salinity of rivers in Korea.

Artificial Intelligence Game System "AlGGAGO" (알까기 인공지능 시스템 "알까고")

  • Lee, Keon-Ho;Yoon, Won-Tak;Park, Jin-Soo;Park, Doo-Soon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.932-935
    • /
    • 2017
  • 최근 인공지능은 딥러닝, 기계학습 등 인공지능 기술이 발전되면서 기술 상용화가 가시화되고 있다. 이에 따라 인공지능분야는 다른 산업의 핵심 기술로 급부상과 함께 여러 글로벌 기업들이 적극적 투자를 실시하고 있는 추세이다. 이렇게 인공지능 기술이 발전하면서 인공지능 기반 기술 개발에서 타산업의 핵심기술로 프레임이 변화 되고 있으며 차세대 ICT 핵심 기술로 인식이 확산되고 있다. 따라서 본 논문에서는 이러한 인공지능 방법중 지도 학습의 의사 결정 트리 알고리즘을 사용하여 AWS(Amazone Web Service) EMR 서버에서 이를 알까기에 적용하여 알까고 게임 시스템을 구현하였다.

Performance Evaluation of HM-Net Speech Recognition System using Korea Large Vocabulary Speech DB (한국어 대어휘 음성DB를 이용한 HM-Net 음성인식 시스템의 성능평가)

  • 오세진;김광동;노덕규;송민규;김범국;황철준;정현열
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2443-2446
    • /
    • 2003
  • 본 논문에서는 한국전자통신연구원에서 제공된 대어휘 음성DB를 이용하여 HM-Net(Hidden Markov Network) 음성인식 시스템의 성능평가를 수행하였다. 음향모델 작성은 음성인식에서 널리 사용되고 있는 통계적인 모델링 방법인 HMM(Hidden Markov Model)을 개량한 HM-Net을 도입하였다 HM-Net은 PDT-SSS 알고리즘에 의해 문맥방향과 시간방향의 상태분할을 수행하여 생성되는데, 특히 문맥방향 상태분할의 경우 학습 음성데이터에 출현하지 않는 문맥정보를 효과적으로 표현하기 위해 음소결정트리를 채용하고 있으며, 시간방향 상태분할의 경우 학습 음성데이터에서 각 음소별 지속시간 정보를 효과적으로 표현하기 위한 상태분할을 수행한다. 이러한 상태분할을 수행하여 파라미터를 공유하게 되며 최적인 모델 네트워크를 작성하게 된다. 대어휘 음성데이터를 이용하여 음향모델을 작성하고 인식실험을 수행한 결과, 100명의 100단어와 60문장에 대해 평균 97.5%, 96.7%의 인식률을 보였다.

  • PDF

Forest smoke detection using Random Forest (Random Forest를 이용한 산불연기 감지)

  • Kwak, Joon-Young;Kim, Deok-Yeon;Ko, Byoung-Chul;Nam, Jae-Yeal
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06c
    • /
    • pp.351-353
    • /
    • 2011
  • 본 논문에서는 CCD카메라로부터 입력된 동영상에서 Random Forest를 이용하여 산불 연기를 검출하는 알고리즘을 제안한다. 산불 연기의 느린 움직임을 보완하기 위해 모든 프레임 대신에 변화가 큰 프레임들을 키 프레임으로 지정하고 지정된 키 프레임의 이전 100프레임 동안의 가 특징 값을 누적시켜 특징 백터를 추출한다. 이후, 학습 데이터들로부터 추출된 특징백터의 훈련과정을 통해 50개의 결정 트리를 갖는 Random Forest를 생성한다. Random Forest는 산불 연기의 정도에 따라 4개의 상태를 나타내는 클래스들로 분류하도록 학습되었으며 Random Forest에 의한 분류결과에 따라 해당 영역이 연기인지 아닌지를 최종 판단한다.