• Title/Summary/Keyword: 결정 규칙

Search Result 942, Processing Time 0.021 seconds

Doubly-robust Q-estimation in observational studies with high-dimensional covariates (고차원 관측자료에서의 Q-학습 모형에 대한 이중강건성 연구)

  • Lee, Hyobeen;Kim, Yeji;Cho, Hyungjun;Choi, Sangbum
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.309-327
    • /
    • 2021
  • Dynamic treatment regimes (DTRs) are decision-making rules designed to provide personalized treatment to individuals in multi-stage randomized trials. Unlike classical methods, in which all individuals are prescribed the same type of treatment, DTRs prescribe patient-tailored treatments which take into account individual characteristics that may change over time. The Q-learning method, one of regression-based algorithms to figure out optimal treatment rules, becomes more popular as it can be easily implemented. However, the performance of the Q-learning algorithm heavily relies on the correct specification of the Q-function for response, especially in observational studies. In this article, we examine a number of double-robust weighted least-squares estimating methods for Q-learning in high-dimensional settings, where treatment models for propensity score and penalization for sparse estimation are also investigated. We further consider flexible ensemble machine learning methods for the treatment model to achieve double-robustness, so that optimal decision rule can be correctly estimated as long as at least one of the outcome model or treatment model is correct. Extensive simulation studies show that the proposed methods work well with practical sample sizes. The practical utility of the proposed methods is proven with real data example.

Reinforcement Learning Model for Mass Casualty Triage Taking into Account the Medical Capability (의료능력을 고려한 대량전상자 환자분류 강화학습 모델)

  • Byeongho Park;Namsuk Cho
    • Journal of the Society of Disaster Information
    • /
    • v.19 no.1
    • /
    • pp.44-59
    • /
    • 2023
  • Purpose: In the event of mass casualties, triage must be done promptly and accurately so that as many patients as possible can be recovered and returned to the battlefield. However, medical personnel have received many tasks with less manpower, and the battlefield for classifying patients is too complex and uncertain. Therefore, we studied an artificial intelligence model that can assist and replace medical personnel on the battlefield. Method: The triage model is presented using reinforcement learning, a field of artificial intelligence. The learning of the model is conducted to find a policy that allows as many patients as possible to be treated, taking into account the condition of randomly set patients and the medical capability of the military hospital. Result: Whether the reinforcement learning model progressed well was confirmed through statistical graphs such as cumulative reward values. In addition, it was confirmed through the number of survivors whether the triage of the learned model was accurate. As a result of comparing the performance with the rule-based model, the reinforcement learning model was able to rescue 10% more patients than the rule-based model. Conclusion: Through this study, it was found that the triage model using reinforcement learning can be used as an alternative to assisting and replacing triage decision-making of medical personnel in the case of mass casualties.

An Empirical Study on the Cryptocurrency Investment Methodology Combining Deep Learning and Short-term Trading Strategies (딥러닝과 단기매매전략을 결합한 암호화폐 투자 방법론 실증 연구)

  • Yumin Lee;Minhyuk Lee
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.377-396
    • /
    • 2023
  • As the cryptocurrency market continues to grow, it has developed into a new financial market. The need for investment strategy research on the cryptocurrency market is also emerging. This study aims to conduct an empirical analysis on an investment methodology of cryptocurrency that combines short-term trading strategy and deep learning. Daily price data of the Ethereum was collected through the API of Upbit, the Korean cryptocurrency exchange. The investment performance of the experimental model was analyzed by finding the optimal parameters based on past data. The experimental model is a volatility breakout strategy(VBS), a Long Short Term Memory(LSTM) model, moving average cross strategy and a combined model. VBS is a short-term trading strategy that buys when volatility rises significantly on a daily basis and sells at the closing price of the day. LSTM is suitable for time series data among deep learning models, and the predicted closing price obtained through the prediction model was applied to the simple trading rule. The moving average cross strategy determines whether to buy or sell when the moving average crosses. The combined model is a trading rule made by using derived variables of the VBS and LSTM model using AND/OR for the buy conditions. The result shows that combined model is better investment performance than the single model. This study has academic significance in that it goes beyond simple deep learning-based cryptocurrency price prediction and improves investment performance by combining deep learning and short-term trading strategies, and has practical significance in that it shows the applicability in actual investment.

Control Method for the Number of Travel Hops for the ACK Packets in Selective Forwarding Detection Scheme (선택적 전달 공격 탐지기법에서의 인증 메시지 전달 홉 수 제어기법)

  • Lee, Sang-Jin;Kim, Jong-Hyun;Cho, Tae-Ho
    • Journal of the Korea Society for Simulation
    • /
    • v.19 no.2
    • /
    • pp.73-80
    • /
    • 2010
  • A wireless sensor network which is deployed in hostile environment can be easily compromised by attackers. The selective forwarding attack can jam the packet or drop a sensitive packet such as the movement of the enemy on data flow path through the compromised node. Xiao, Yu and Gao proposed the checkpoint-based multi-hop acknowledgement scheme(CHEMAS). In CHEMAS, each path node enable to be the checkpoint node according to the pre-defined probability and then can detect the area where the selective forwarding attacks is generated through the checkpoint nodes. In this scheme, the number of hops is very important because this parameter may trade off between energy conservation and detection capacity. In this paper, we used the fuzzy rule system to determine adaptive threshold value which is the number of hops for the ACK packets. In every period, the base station determines threshold value while using fuzzy logic. The energy level, the number of compromised node, and the distance to each node from base station are used to determine threshold value in fuzzy logic.

Building a Model to Estimate Pedestrians' Critical Lags on Crosswalks (횡단보도에서의 보행자의 임계간격추정 모형 구축)

  • Kim, Kyung Whan;Kim, Daehyon;Lee, Ik Su;Lee, Deok Whan
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.29 no.1D
    • /
    • pp.33-40
    • /
    • 2009
  • The critical lag of crosswalk pedestrians is an important parameter in analyzing traffic operation at unsignalized crosswalks, however there is few research in this field in Korea. The purpose of this study is to develop a model to estimate the critical lag. Among the elements which influence the critical lag, the age of pedestrians and the length of crosswalks, which have fuzzy characteristics, and the each lag which is rejected or accepted are collected on crosswalks of which lengths range from 3.5 m to 10.5 m. The values of the critical lag range from 2.56 sec. to 5.56 sec. The age and the length are divided to the 3 fuzzy variables each, and the critical lag of each case is estimated according to Raff's technique, so a total of 9 fuzzy rules are established. Based on the rules, an ANFIS (Adaptive Neuro-Fuzzy Inference System) model to estimate the critical lag is built. The predictability of the model is evaluated comparing the observed with the estimated critical lags by the model. Statistics of $R^2$, MAE, MSE are 0.96, 0.097, 0.015 respectively. Therefore, the model is evaluated to explain the result well. During this study, it is found that the critical lag increases rapidly over the pedestrian's age of 40 years.

A Hybrid SVM Classifier for Imbalanced Data Sets (불균형 데이터 집합의 분류를 위한 하이브리드 SVM 모델)

  • Lee, Jae Sik;Kwon, Jong Gu
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.125-140
    • /
    • 2013
  • We call a data set in which the number of records belonging to a certain class far outnumbers the number of records belonging to the other class, 'imbalanced data set'. Most of the classification techniques perform poorly on imbalanced data sets. When we evaluate the performance of a certain classification technique, we need to measure not only 'accuracy' but also 'sensitivity' and 'specificity'. In a customer churn prediction problem, 'retention' records account for the majority class, and 'churn' records account for the minority class. Sensitivity measures the proportion of actual retentions which are correctly identified as such. Specificity measures the proportion of churns which are correctly identified as such. The poor performance of the classification techniques on imbalanced data sets is due to the low value of specificity. Many previous researches on imbalanced data sets employed 'oversampling' technique where members of the minority class are sampled more than those of the majority class in order to make a relatively balanced data set. When a classification model is constructed using this oversampled balanced data set, specificity can be improved but sensitivity will be decreased. In this research, we developed a hybrid model of support vector machine (SVM), artificial neural network (ANN) and decision tree, that improves specificity while maintaining sensitivity. We named this hybrid model 'hybrid SVM model.' The process of construction and prediction of our hybrid SVM model is as follows. By oversampling from the original imbalanced data set, a balanced data set is prepared. SVM_I model and ANN_I model are constructed using the imbalanced data set, and SVM_B model is constructed using the balanced data set. SVM_I model is superior in sensitivity and SVM_B model is superior in specificity. For a record on which both SVM_I model and SVM_B model make the same prediction, that prediction becomes the final solution. If they make different prediction, the final solution is determined by the discrimination rules obtained by ANN and decision tree. For a record on which SVM_I model and SVM_B model make different predictions, a decision tree model is constructed using ANN_I output value as input and actual retention or churn as target. We obtained the following two discrimination rules: 'IF ANN_I output value <0.285, THEN Final Solution = Retention' and 'IF ANN_I output value ${\geq}0.285$, THEN Final Solution = Churn.' The threshold 0.285 is the value optimized for the data used in this research. The result we present in this research is the structure or framework of our hybrid SVM model, not a specific threshold value such as 0.285. Therefore, the threshold value in the above discrimination rules can be changed to any value depending on the data. In order to evaluate the performance of our hybrid SVM model, we used the 'churn data set' in UCI Machine Learning Repository, that consists of 85% retention customers and 15% churn customers. Accuracy of the hybrid SVM model is 91.08% that is better than that of SVM_I model or SVM_B model. The points worth noticing here are its sensitivity, 95.02%, and specificity, 69.24%. The sensitivity of SVM_I model is 94.65%, and the specificity of SVM_B model is 67.00%. Therefore the hybrid SVM model developed in this research improves the specificity of SVM_B model while maintaining the sensitivity of SVM_I model.

Rough Set Analysis for Stock Market Timing (러프집합분석을 이용한 매매시점 결정)

  • Huh, Jin-Nyung;Kim, Kyoung-Jae;Han, In-Goo
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.77-97
    • /
    • 2010
  • Market timing is an investment strategy which is used for obtaining excessive return from financial market. In general, detection of market timing means determining when to buy and sell to get excess return from trading. In many market timing systems, trading rules have been used as an engine to generate signals for trade. On the other hand, some researchers proposed the rough set analysis as a proper tool for market timing because it does not generate a signal for trade when the pattern of the market is uncertain by using the control function. The data for the rough set analysis should be discretized of numeric value because the rough set only accepts categorical data for analysis. Discretization searches for proper "cuts" for numeric data that determine intervals. All values that lie within each interval are transformed into same value. In general, there are four methods for data discretization in rough set analysis including equal frequency scaling, expert's knowledge-based discretization, minimum entropy scaling, and na$\ddot{i}$ve and Boolean reasoning-based discretization. Equal frequency scaling fixes a number of intervals and examines the histogram of each variable, then determines cuts so that approximately the same number of samples fall into each of the intervals. Expert's knowledge-based discretization determines cuts according to knowledge of domain experts through literature review or interview with experts. Minimum entropy scaling implements the algorithm based on recursively partitioning the value set of each variable so that a local measure of entropy is optimized. Na$\ddot{i}$ve and Booleanreasoning-based discretization searches categorical values by using Na$\ddot{i}$ve scaling the data, then finds the optimized dicretization thresholds through Boolean reasoning. Although the rough set analysis is promising for market timing, there is little research on the impact of the various data discretization methods on performance from trading using the rough set analysis. In this study, we compare stock market timing models using rough set analysis with various data discretization methods. The research data used in this study are the KOSPI 200 from May 1996 to October 1998. KOSPI 200 is the underlying index of the KOSPI 200 futures which is the first derivative instrument in the Korean stock market. The KOSPI 200 is a market value weighted index which consists of 200 stocks selected by criteria on liquidity and their status in corresponding industry including manufacturing, construction, communication, electricity and gas, distribution and services, and financing. The total number of samples is 660 trading days. In addition, this study uses popular technical indicators as independent variables. The experimental results show that the most profitable method for the training sample is the na$\ddot{i}$ve and Boolean reasoning but the expert's knowledge-based discretization is the most profitable method for the validation sample. In addition, the expert's knowledge-based discretization produced robust performance for both of training and validation sample. We also compared rough set analysis and decision tree. This study experimented C4.5 for the comparison purpose. The results show that rough set analysis with expert's knowledge-based discretization produced more profitable rules than C4.5.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.

Legal status of Priave Transaction Regarding the Geostationary Satellite Orbit (지구정지궤도의 사적 거래의 국제법상 지위에 관한 연구)

  • Shin, Hong Kyun
    • The Korean Journal of Air & Space Law and Policy
    • /
    • v.29 no.2
    • /
    • pp.239-272
    • /
    • 2014
  • The rights and obligations of the Member States of ITU in the domain of international frequency management of the spectrum/orbit resource are incorporated in the Constitution and Convention of the ITU and in the Radio Regulations that complement them. These instruments contain the main principles and lay down the specific regulations governing the major elements such as rights and obligations of member administrations in obtaining access to the spectrum/orbit resource, as well as international recognition of these rights by recording frequency assignments and, as appropriate, any associated orbits, including the geostationary-satellite orbits used or intended to be used in the Master International Frequency Register (MIFR) Coordination is a further step in the process leading up to notification of the frequency assignments for recording in the MIFR. This procedure is a formal regulatory obligation both for an administration seeking to assign a frequency in its network and for an administration whose existing or planned services may be affected by that assignment. Regulatory problem lies in allowing administrations to fulfill their "bringing into use" duty for preserving his filing simply putting any satellites, whatever nationlity or technical specification may be, into filed orbit. This sort of regulatory lack may result in the emergence of the secondary market for satellite orbit. Within satellite orbit secondary market, the object of transaction may be the satellite itself, or the regulatory rights in rem, or the orbit registered in the MIFR. Recent case of selling the Koreasat belongs to the typical example of orbit transaction between private companies, the legality of which remains doubtedly controversial from the perspective of international space law as well as international transaction law. It must be noted, however, that the fact is the Koreasat 3 and its filed orbit is for sale.

Methodology for Issue-related R&D Keywords Packaging Using Text Mining (텍스트 마이닝 기반의 이슈 관련 R&D 키워드 패키징 방법론)

  • Hyun, Yoonjin;Shun, William Wong Xiu;Kim, Namgyu
    • Journal of Internet Computing and Services
    • /
    • v.16 no.2
    • /
    • pp.57-66
    • /
    • 2015
  • Considerable research efforts are being directed towards analyzing unstructured data such as text files and log files using commercial and noncommercial analytical tools. In particular, researchers are trying to extract meaningful knowledge through text mining in not only business but also many other areas such as politics, economics, and cultural studies. For instance, several studies have examined national pending issues by analyzing large volumes of text on various social issues. However, it is difficult to provide successful information services that can identify R&D documents on specific national pending issues. While users may specify certain keywords relating to national pending issues, they usually fail to retrieve appropriate R&D information primarily due to discrepancies between these terms and the corresponding terms actually used in the R&D documents. Thus, we need an intermediate logic to overcome these discrepancies, also to identify and package appropriate R&D information on specific national pending issues. To address this requirement, three methodologies are proposed in this study-a hybrid methodology for extracting and integrating keywords pertaining to national pending issues, a methodology for packaging R&D information that corresponds to national pending issues, and a methodology for constructing an associative issue network based on relevant R&D information. Data analysis techniques such as text mining, social network analysis, and association rules mining are utilized for establishing these methodologies. As the experiment result, the keyword enhancement rate by the proposed integration methodology reveals to be about 42.8%. For the second objective, three key analyses were conducted and a number of association rules between national pending issue keywords and R&D keywords were derived. The experiment regarding to the third objective, which is issue clustering based on R&D keywords is still in progress and expected to give tangible results in the future.