• Title/Summary/Keyword: Machine Learning

Search Result 5,209, Processing Time 0.033 seconds

Data Mining Tool for Stock Investors' Decision Support (주식 투자자의 의사결정 지원을 위한 데이터마이닝 도구)

  • Kim, Sung-Dong
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.2
    • /
    • pp.472-482
    • /
    • 2012
  • There are many investors in the stock market, and more and more people get interested in the stock investment. In order to avoid risks and make profit in the stock investment, we have to determine several aspects using various information. That is, we have to select profitable stocks and determine appropriate buying/selling prices and holding period. This paper proposes a data mining tool for the investors' decision support. The data mining tool makes stock investors apply machine learning techniques and generate stock price prediction model. Also it helps determine buying/selling prices and holding period. It supports individual investor's own decision making using past data. Using the proposed tool, users can manage stock data, generate their own stock price prediction models, and establish trading policy via investment simulation. Users can select technical indicators which they think affect future stock price. Then they can generate stock price prediction models using the indicators and test the models. They also perform investment simulation using proper models to find appropriate trading policy consisting of buying/selling prices and holding period. Using the proposed data mining tool, stock investors can expect more profit with the help of stock price prediction model and trading policy validated on past data, instead of with an emotional decision.

Who Gets Government SME R&D Subsidy? Application of Gradient Boosting Model (Gradient Boosting 모형을 이용한 중소기업 R&D 지원금 결정요인 분석)

  • Kang, Sung Won;Kang, HeeChan
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.4
    • /
    • pp.77-109
    • /
    • 2020
  • In this paper, we build a gradient Boosting model to predict government SME R&D subsidy, select features of high importance, and measure the impact of each features to the predicted subsidy using PDP and SHAP value. Unlike previous empirical researches, we focus on the effect of the R&D subsidy distribution pattern to the incentive of the firms participating subsidy competition. We used the firm data constructed by KISTEP linking government R&D subsidy record with financial statements provided by NICE, and applied a Gradient Boosting model to predict R&D subsidy. We found that firms with higher R&D performance and larger R&D investment tend to have higher R&D subsidies, but firms with higher operation profit or total asset turnover rate tend to have lower R&D subsidies. Our results suggest that current government R&D subsidy distribution pattern provides incentive to improve R&D project performance, but not business performance.

Performance Optimization Strategies for Fully Utilizing Apache Spark (아파치 스파크 활용 극대화를 위한 성능 최적화 기법)

  • Myung, Rohyoung;Yu, Heonchang;Choi, Sukyong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.7 no.1
    • /
    • pp.9-18
    • /
    • 2018
  • Enhancing performance of big data analytics in distributed environment has been issued because most of the big data related applications such as machine learning techniques and streaming services generally utilize distributed computing frameworks. Thus, optimizing performance of those applications at Spark has been actively researched. Since optimizing performance of the applications at distributed environment is challenging because it not only needs optimizing the applications themselves but also requires tuning of the distributed system configuration parameters. Although prior researches made a huge effort to improve execution performance, most of them only focused on one of three performance optimization aspect: application design, system tuning, hardware utilization. Thus, they couldn't handle an orchestration of those aspects. In this paper, we deeply analyze and model the application processing procedure of the Spark. Through the analyzed results, we propose performance optimization schemes for each step of the procedure: inner stage and outer stage. We also propose appropriate partitioning mechanism by analyzing relationship between partitioning parallelism and performance of the applications. We applied those three performance optimization schemes to WordCount, Pagerank, and Kmeans which are basic big data analytics and found nearly 50% performance improvement when all of those schemes are applied.

Development of Music Classification of Light and Shade using VCM and Beat Tracking (VCM과 Beat Tracking을 이용한 음악의 명암 분류 기법 개발)

  • Park, Seung-Min;Park, Jun-Heong;Lee, Young-Hwan;Ko, Kwang-Eun;Sim, Kwee-Bo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.6
    • /
    • pp.884-889
    • /
    • 2010
  • Recently, a music genre classification has been studied. However, experts use different criteria to classify each of these classifications is difficult to derive accurate results. In addition, when the emergence of a new genre of music genre is a newly re-defined. Music as a genre rather than to separate search should be classified as emotional words. In this paper, the feelings of people on the basis of brightness and darkness tries to categorize music. The proposed classification system by applying VCM(Variance Considered Machines) is the contrast of the music. In this paper, we are using three kinds of musical characteristics. Based on surveys made throughout the learning, based on musical attributes(beat, timbre, note) was used to study in the VCM. VCM is classified by the trained compared with the results of the survey were analyzed. Note extraction using the MATLAB, sampled at regular intervals to share music via the FFT frequency analysis by the sector average is defined as representing the element extracted note by quantifying the height of the entire distribution was identified. Cumulative frequency distribution in the entire frequency rage, using the difference in Timbre and were quantified. VCM applied to these three characteristics with the experimental results by comparing the survey results to see the contrast of the music with a probability of 95.4% confirmed that the two separate.

Load Fidelity Improvement of Piecewise Integrated Composite Beam by Construction Training Data of k-NN Classification Model (k-NN 분류 모델의 학습 데이터 구성에 따른 PIC 보의 하중 충실도 향상에 관한 연구)

  • Ham, Seok Woo;Cheon, Seong S.
    • Composites Research
    • /
    • v.33 no.3
    • /
    • pp.108-114
    • /
    • 2020
  • Piecewise Integrated Composite (PIC) beam is composed of different stacking against loading type depending upon location. The aim of current study is to assign robust stacking sequences against external loading to every corresponding part of the PIC beam based on the value of stress triaxiality at generated reference points using the k-NN (k-Nearest Neighbor) classification, which is one of representative machine learning techniques, in order to excellent superior bending characteristics. The stress triaxiality at reference points is obtained by three-point bending analysis of the Al beam with training data categorizing the type of external loading, i.e., tension, compression or shear. Loading types of each plane of the beam were classified by independent plane scheme as well as total beam scheme. Also, loading fidelities were calibrated for each case with the variation of hyper-parameters. Most effective stacking sequences were mapped into the PIC beam based on the k-NN classification model with the highest loading fidelity. FE analysis result shows the PIC beam has superior external loading resistance and energy absorption compared to conventional beam.

Empirical Research on Search model of Web Service Repository (웹서비스 저장소의 검색기법에 관한 실증적 연구)

  • Hwang, You-Sub
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.173-193
    • /
    • 2010
  • The World Wide Web is transitioning from being a mere collection of documents that contain useful information toward providing a collection of services that perform useful tasks. The emerging Web service technology has been envisioned as the next technological wave and is expected to play an important role in this recent transformation of the Web. By providing interoperable interface standards for application-to-application communication, Web services can be combined with component-based software development to promote application interaction and integration within and across enterprises. To make Web services for service-oriented computing operational, it is important that Web services repositories not only be well-structured but also provide efficient tools for an environment supporting reusable software components for both service providers and consumers. As the potential of Web services for service-oriented computing is becoming widely recognized, the demand for an integrated framework that facilitates service discovery and publishing is concomitantly growing. In our research, we propose a framework that facilitates Web service discovery and publishing by combining clustering techniques and leveraging the semantics of the XML-based service specification in WSDL files. We believe that this is one of the first attempts at applying unsupervised artificial neural network-based machine-learning techniques in the Web service domain. We have developed a Web service discovery tool based on the proposed approach using an unsupervised artificial neural network and empirically evaluated the proposed approach and tool using real Web service descriptions drawn from operational Web services repositories. We believe that both service providers and consumers in a service-oriented computing environment can benefit from our Web service discovery approach.

The Capacity of Multi-Valued Single Layer CoreNet(Neural Network) and Precalculation of its Weight Values (단층 코어넷 다단입력 인공신경망회로의 처리용량과 사전 무게값 계산에 관한 연구)

  • Park, Jong-Joon
    • Journal of IKEEE
    • /
    • v.15 no.4
    • /
    • pp.354-362
    • /
    • 2011
  • One of the unsolved problems in Artificial Neural Networks is related to the capacity of a neural network. This paper presents a CoreNet which has a multi-leveled input and a multi-leveled output as a 2-layered artificial neural network. I have suggested an equation for calculating the capacity of the CoreNet, which has a p-leveled input and a q-leveled output, as $a_{p,q}=\frac{1}{2}p(p-1)q^2-\frac{1}{2}(p-2)(3p-1)q+(p-1)(p-2)$. With an odd value of p and an even value of q, (p-1)(p-2)(q-2)/2 needs to be subtracted further from the above equation. The simulation model 1(3)-1(6) has 3 levels of an input and 6 levels of an output with no hidden layer. The simulation result of this model gives, out of 216 possible functions, 80 convergences for the number of implementable function using the cot(x) input leveling method. I have also shown that, from the simulation result, the two diverged functions become implementable by precalculating the weight values. The simulation result and the precalculation of the weight values give the same result as the above equation in the total number of implementable functions.

Class prediction of an independent sample using a set of gene modules consisting of gene-pairs which were condition(Tumor, Normal) specific (조건(암, 정상)에 따라 특이적 관계를 나타내는 유전자 쌍으로 구성된 유전자 모듈을 이용한 독립샘플의 클래스예측)

  • Jeong, Hyeon-Iee;Yoon, Young-Mi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.12
    • /
    • pp.197-207
    • /
    • 2010
  • Using a variety of data-mining methods on high-throughput cDNA microarray data, the level of gene expression in two different tissues can be compared, and DEG(Differentially Expressed Gene) genes in between normal cell and tumor cell can be detected. Diagnosis can be made with these genes, and also treatment strategy can be determined according to the cancer stages. Existing cancer classification methods using machine learning select the marker genes which are differential expressed in normal and tumor samples, and build a classifier using those marker genes. However, in addition to the differences in gene expression levels, the difference in gene-gene correlations between two conditions could be a good marker in disease diagnosis. In this study, we identify gene pairs with a big correlation difference in two sets of samples, build gene classification modules using these gene pairs. This cancer classification method using gene modules achieves higher accuracy than current methods. The implementing clinical kit can be considered since the number of genes in classification module is small. For future study, Authors plan to identify novel cancer-related genes with functionality analysis on the genes in a classification module through GO(Gene Ontology) enrichment validation, and to extend the classification module into gene regulatory networks.

A study on the characteristics of cyanobacteria in the mainstream of Nakdong river using decision trees (의사결정나무를 이용한 낙동강 본류 구간의 남조류 발생특성 연구)

  • Jung, Woo Suk;Jo, Bu Geon;Kim, Young Do;Kim, Sung Eun
    • Journal of Wetlands Research
    • /
    • v.21 no.4
    • /
    • pp.312-320
    • /
    • 2019
  • The occurrence of cyanobacteria causes problems such as oxygen depletion and increase of organic matter in the water body due to mass prosperity and death. Each year, Algae bloom warning System is issued due to the effects of summer heat and drought. It is necessary to quantitatively characterize the occurrence of cyanobacteria for proactive green algae management in the main Nakdong river. In this study, we analyzed the major influencing factors on cyanobacteria bloom using visualization and correlation analysis. A decision tree, a machine learning method, was used to quantitatively analyze the conditions of cyanobacteria according to the influence factors. In all the weirs, meteorological factors, temperature and SPI drought index, were significantly correlated with cyanobacterial cell number. Increasing the number of days of heat wave and drought block the mixing of water in the water body and the stratification phenomenon to promote the development of cyanobacteria. In the long term, it is necessary to proactively manage cyanobacteria considering the meteorological impacts.

A point-scale gap filling of the flux-tower data using the artificial neural network (인공신경망 기법을 이용한 청미천 유역 Flux tower 결측치 보정)

  • Jeon, Hyunho;Baik, Jongjin;Lee, Seulchan;Choi, Minha
    • Journal of Korea Water Resources Association
    • /
    • v.53 no.11
    • /
    • pp.929-938
    • /
    • 2020
  • In this study, we estimated missing evapotranspiration (ET) data at a eddy-covariance flux tower in the Cheongmicheon farmland site using the Artificial Neural Network (ANN). The ANN showed excellent performance in numerical analysis and is expanding in various fields. To evaluate the performance the ANN-based gap-filling, ET was calculated using the existing gap-filling methods of Mean Diagnostic Variation (MDV) and Food and Aggregation Organization Penman-Monteith (FAO-PM). Then ET was evaluated by time series method and statistical analysis (coefficient of determination, index of agreement (IOA), root mean squared error (RMSE) and mean absolute error (MAE). For the validation of each gap-filling model, we used 30 minutes of data in 2015. Of the 121 missing values, the ANN method showed the best performance by supplementing 70, 53 and 84 missing values, respectively, in the order of MDV, FAO-PM, and ANN methods. Analysis of the coefficient of determination (MDV, FAO-PM, and ANN methods followed by 0.673, 0.784, and 0.841, respectively.) and the IOA (The MDV, FAO-PM, and ANN methods followed by 0.899, 0.890, and 0.951 respectively.) indicated that, all three methods were highly correlated and considered to be fully utilized, and among them, ANN models showed the highest performance and suitability. Based on this study, it could be used more appropriately in the study of gap-filling method of flux tower data using machine learning method.