• Title/Summary/Keyword: Reasoning Dataset

Search Result 35, Processing Time 0.021 seconds

An Automatic Pattern Recognition Algorithm for Identifying the Spatio-temporal Congestion Evolution Patterns in Freeway Historic Data (고속도로 이력데이터에 포함된 정체 시공간 전개 패턴 자동인식 알고리즘 개발)

  • Park, Eun Mi;Oh, Hyun Sun
    • Journal of Korean Society of Transportation
    • /
    • v.32 no.5
    • /
    • pp.522-530
    • /
    • 2014
  • Spatio-temporal congestion evolution pattern can be reproduced using the VDS(Vehicle Detection System) historic speed dataset in the TMC(Traffic Management Center)s. Such dataset provides a pool of spatio-temporally experienced traffic conditions. Traffic flow pattern is known as spatio-temporally recurred, and even non-recurrent congestion caused by incidents has patterns according to the incident conditions. These imply that the information should be useful for traffic prediction and traffic management. Traffic flow predictions are generally performed using black-box approaches such as neural network, genetic algorithm, and etc. Black-box approaches are not designed to provide an explanation of their modeling and reasoning process and not to estimate the benefits and the risks of the implementation of such a solution. TMCs are reluctant to employ the black-box approaches even though there are numerous valuable articles. This research proposes a more readily understandable and intuitively appealing data-driven approach and developes an algorithm for identifying congestion patterns for recurrent and non-recurrent congestion management and information provision.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

Multiple Instance Mamdani Fuzzy Inference

  • Khalifa, Amine B.;Frigui, Hichem
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.15 no.4
    • /
    • pp.217-231
    • /
    • 2015
  • A novel fuzzy learning framework that employs fuzzy inference to solve the problem of Multiple Instance Learning (MIL) is presented. The framework introduces a new class of fuzzy inference systems called Multiple Instance Mamdani Fuzzy Inference Systems (MI-Mamdani). In multiple instance problems, the training data is ambiguously labeled. Instances are grouped into bags, labels of bags are known but not those of individual instances. MIL deals with learning a classifier at the bag level. Over the years, many solutions to this problem have been proposed. However, no MIL formulation employing fuzzy inference exists in the literature. Fuzzy logic is powerful at modeling knowledge uncertainty and measurements imprecision. It is one of the best frameworks to model vagueness. However, in addition to uncertainty and imprecision, there is a third vagueness concept that fuzzy logic does not address quiet well, yet. This vagueness concept is due to the ambiguity that arises when the data have multiple forms of expression, this is the case for multiple instance problems. In this paper, we introduce multiple instance fuzzy logic that enables fuzzy reasoning with bags of instances. Accordingly, a MI-Mamdani that extends the standard Mamdani inference system to compute with multiple instances is introduced. The proposed framework is tested and validated using a synthetic dataset suitable for MIL problems. Additionally, we apply the proposed multiple instance inference to fuse the output of multiple discrimination algorithms for the purpose of landmine detection using Ground Penetrating Radar.

An Integrated Approach Using Change-Point Detection and Artificial neural Networks for Interest Rates Forecasting

  • Oh, Kyong-Joo;Ingoo Han
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2000.04a
    • /
    • pp.235-241
    • /
    • 2000
  • This article suggests integrated neural network models for the interest rate forecasting using change point detection. The basic concept of proposed model is to obtain intervals divided by change point, to identify them as change-point groups, and to involve them in interest rate forecasting. the proposed models consist of three stages. The first stage is to detect successive change points in interest rate dataset. The second stage is to forecast change-point group with data mining classifiers. The final stage is to forecast the desired output with BPN. Based on this structure, we propose three integrated neural network models in terms of data mining classifier: (1) multivariate discriminant analysis (MDA)-supported neural network model, (2) case based reasoning (CBR)-supported neural network model and (3) backpropagation neural networks (BPN)-supported neural network model. Subsequently, we compare these models with a neural networks (BPN)-supported neural network model. Subsequently, we compare these models with a neural network model alone and, in addition, determine which of three classifiers (MDA, CBR and BPN) can perform better. This article is then to examine the predictability of integrated neural network models for interest rate forecasting using change-point detection.

  • PDF

Audit Quality and Stock Return Co-Movement: Evidence from Vietnam

  • PHAM, Chi Bich Thi;VU, Thu Minh Thi;NGUYEN, Linh Ha;NGUYEN, Dung Duc
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.7 no.7
    • /
    • pp.139-147
    • /
    • 2020
  • This paper aims to explore the relationship between the quality of the audit and the level of stock return co-movement in the context of the Vietnamese emerging market. The empirical study is designed based on the quatitative method and deductive approach. The panel dataset includes 256 listed firms from different industries,with 1115 firm-year observations on Ho Chi Minh City Stock Exchange for the period from 2014 to 2018. In the research, we built the econometric regression model, using stock return synchronicity and audit quality as the dependent and independent variable, respectively. Some control variables are also added to the econometric regression models as they are well-documented in prior research to have an effect on stock price synchronicity. To improve the accuracy of the regression coefficients, beside the Ordinary Least Squares, we employ the Random Effects Model and the Fixed Effects Model for better statistical analysis of panel data set. The results show that the quality of the audit is positively correlated to stock price synchronicity. This finding suggests that stock returns of companies with higher quality of the audit are more synchronous with the market. Results for other control variables also support our reasoning for the main findings.

KommonGen: A Dataset for Korean Generative Commonsense Reasoning Evaluation (KommonGen: 한국어 생성 모델의 상식 추론 평가 데이터셋)

  • Seo, Jaehyung;Park, Chanjun;Moon, Hyeonseok;Eo, Sugyeong;Kang, Myunghoon;Lee, Seounghoon;Lim, Heuiseok
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.55-60
    • /
    • 2021
  • 최근 한국어에 대한 자연어 처리 연구는 딥러닝 기반의 자연어 이해 모델을 중심으로 각 모델의 성능에 대한 비교 분석과 평가가 활발하게 이루어지고 있다. 그러나 한국어 생성 모델에 대해서도 자연어 이해 영역의 하위 과제(e.g. 감정 분류, 문장 유사도 측정 등)에 대한 수행 능력만을 정량적으로 평가하여, 생성 모델의 한국어 문장 구성 능력이나 상식 추론 과정을 충분히 평가하지 못하고 있다. 또한 대부분의 생성 모델은 여전히 간단하고 일반적인 상식에 부합하는 자연스러운 문장을 생성하는 것에도 큰 어려움을 겪고 있기에 이를 해결하기 위한 개선 연구가 필요한 상황이다. 따라서 본 논문은 이러한 문제를 해결하기 위해 한국어 생성 모델이 일반 상식 추론 능력을 바탕으로 문장을 생성하도록 KommonGen 데이터셋을 제안한다. 그리고 KommonGen을 통해 한국어 생성 모델의 성능을 정량적으로 비교 분석할 수 있도록 평가 기준을 구성하고, 한국어 기반 자연어 생성 모델의 개선 방향을 제시하고자 한다.

  • PDF

Numerical Reasoning Dataset Augmentation Using Large Language Model and In-Context Learning (대규모 언어 모델 및 인컨텍스트 러닝을 활용한 수치 추론 데이터셋 증강)

  • Yechan Hwang;Jinsu Lim;Young-Jun Lee;Ho-Jin Choi
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.203-208
    • /
    • 2023
  • 본 논문에서는 대규모 언어 모델의 인컨텍스트 러닝과 프롬프팅을 활용하여 수치 추론 태스크 데이터셋을 효과적으로 증강시킬 수 있는 방법론을 제안한다. 또한 모델로 하여금 수치 추론 데이터의 이해를 도울 수 있는 전처리와 요구사항을 만족하지 못하는 결과물을 필터링 하는 검증 단계를 추가하여 생성되는 데이터의 퀄리티를 보장하고자 하였다. 이렇게 얻어진 증강 절차를 거쳐 증강을 진행한 뒤 추론용 모델 학습을 통해 다른 증강 방법론보다 우리의 방법론으로 증강된 데이터셋으로 학습된 모델이 더 높은 성능을 낼 수 있음을 보였다. 실험 결과 우리의 증강 데이터로 학습된 모델은 원본 데이터로 학습된 모델보다 모든 지표에서 2%p 이상의 성능 향상을 보였으며 다양한 케이스를 통해 우리의 모델이 수치 추론 학습 데이터의 다양성을 크게 향상시킬 수 있음을 확인하였다.

  • PDF

Fake News Detection for Korean News Using Text Mining and Machine Learning Techniques (텍스트 마이닝과 기계 학습을 이용한 국내 가짜뉴스 예측)

  • Yun, Tae-Uk;Ahn, Hyunchul
    • Journal of Information Technology Applications and Management
    • /
    • v.25 no.1
    • /
    • pp.19-32
    • /
    • 2018
  • Fake news is defined as the news articles that are intentionally and verifiably false, and could mislead readers. Spread of fake news may provoke anxiety, chaos, fear, or irrational decisions of the public. Thus, detecting fake news and preventing its spread has become very important issue in our society. However, due to the huge amount of fake news produced every day, it is almost impossible to identify it by a human. Under this context, researchers have tried to develop automated fake news detection method using Artificial Intelligence techniques over the past years. But, unfortunately, there have been no prior studies proposed an automated fake news detection method for Korean news. In this study, we aim to detect Korean fake news using text mining and machine learning techniques. Our proposed method consists of two steps. In the first step, the news contents to be analyzed is convert to quantified values using various text mining techniques (Topic Modeling, TF-IDF, and so on). After that, in step 2, classifiers are trained using the values produced in step 1. As the classifiers, machine learning techniques such as multiple discriminant analysis, case based reasoning, artificial neural networks, and support vector machine can be applied. To validate the effectiveness of the proposed method, we collected 200 Korean news from Seoul National University's FactCheck (http://factcheck.snu.ac.kr). which provides with detailed analysis reports from about 20 media outlets and links to source documents for each case. Using this dataset, we will identify which text features are important as well as which classifiers are effective in detecting Korean fake news.

A Prediction System of User Preferences for Newly Released Items Based on Words (새로 출시되는 품목들을 위한 단어 기반의 사용자 선호도 예측 기법)

  • Choi, Yoon-Seok;Moon, Byung-Ro
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.16 no.2
    • /
    • pp.156-163
    • /
    • 2006
  • CF systems are widely used in recommendation due to the easy implementation and the outstanding performance. They have several problems such as the sparsity problem, the first-rater problem, and recommending explanation. Many studies are suggested to resolve these problems. While the influence of the sparsity problem lessens as the users' data are accumulated, but the first-rater problem is originated from the CF systems and there are a number of researches to overcome the disadvantages of CF systems based on the content-based methods. Also CF systems are black boxes, providing no explanation of working of the recommendation. In this paper we present a content-based prediction system based on the preference words, which exposes the reasoning behind a recommendation. Our system predicts user's rating of a new movie and we suggest a semiotic network-based method to solve the mismatching problem between the items. For experimental comparison, we used EachMovie and IMDb dataset.

Lightweight Convolution Module based Detection Model for Small Embedded Devices (소형 임베디드 장치를 위한 경량 컨볼루션 모듈 기반의 검출 모델)

  • Park, Chan-Soo;Lee, Sang-Hun;Han, Hyun-Ho
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.9
    • /
    • pp.28-34
    • /
    • 2021
  • In the case of object detection using deep learning, both accuracy and real-time are required. However, it is difficult to use a deep learning model that processes a large amount of data in a limited resource environment. To solve this problem, this paper proposes an object detection model for small embedded devices. Unlike the general detection model, the model size was minimized by using a structure in which the pre-trained feature extractor was removed. The structure of the model was designed by repeatedly stacking lightweight convolution blocks. In addition, the number of region proposals is greatly reduced to reduce detection overhead. The proposed model was trained and evaluated using the public dataset PASCAL VOC. For quantitative evaluation of the model, detection performance was measured with average precision used in the detection field. And the detection speed was measured in a Raspberry Pi similar to an actual embedded device. Through the experiment, we achieved improved accuracy and faster reasoning speed compared to the existing detection method.