• Title/Summary/Keyword: 통계예측모델

Search Result 541, Processing Time 0.111 seconds

A Time Series Graph based Convolutional Neural Network Model for Effective Input Variable Pattern Learning : Application to the Prediction of Stock Market (효과적인 입력변수 패턴 학습을 위한 시계열 그래프 기반 합성곱 신경망 모형: 주식시장 예측에의 응용)

  • Lee, Mo-Se;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.167-181
    • /
    • 2018
  • Over the past decade, deep learning has been in spotlight among various machine learning algorithms. In particular, CNN(Convolutional Neural Network), which is known as the effective solution for recognizing and classifying images or voices, has been popularly applied to classification and prediction problems. In this study, we investigate the way to apply CNN in business problem solving. Specifically, this study propose to apply CNN to stock market prediction, one of the most challenging tasks in the machine learning research. As mentioned, CNN has strength in interpreting images. Thus, the model proposed in this study adopts CNN as the binary classifier that predicts stock market direction (upward or downward) by using time series graphs as its inputs. That is, our proposal is to build a machine learning algorithm that mimics an experts called 'technical analysts' who examine the graph of past price movement, and predict future financial price movements. Our proposed model named 'CNN-FG(Convolutional Neural Network using Fluctuation Graph)' consists of five steps. In the first step, it divides the dataset into the intervals of 5 days. And then, it creates time series graphs for the divided dataset in step 2. The size of the image in which the graph is drawn is $40(pixels){\times}40(pixels)$, and the graph of each independent variable was drawn using different colors. In step 3, the model converts the images into the matrices. Each image is converted into the combination of three matrices in order to express the value of the color using R(red), G(green), and B(blue) scale. In the next step, it splits the dataset of the graph images into training and validation datasets. We used 80% of the total dataset as the training dataset, and the remaining 20% as the validation dataset. And then, CNN classifiers are trained using the images of training dataset in the final step. Regarding the parameters of CNN-FG, we adopted two convolution filters ($5{\times}5{\times}6$ and $5{\times}5{\times}9$) in the convolution layer. In the pooling layer, $2{\times}2$ max pooling filter was used. The numbers of the nodes in two hidden layers were set to, respectively, 900 and 32, and the number of the nodes in the output layer was set to 2(one is for the prediction of upward trend, and the other one is for downward trend). Activation functions for the convolution layer and the hidden layer were set to ReLU(Rectified Linear Unit), and one for the output layer set to Softmax function. To validate our model - CNN-FG, we applied it to the prediction of KOSPI200 for 2,026 days in eight years (from 2009 to 2016). To match the proportions of the two groups in the independent variable (i.e. tomorrow's stock market movement), we selected 1,950 samples by applying random sampling. Finally, we built the training dataset using 80% of the total dataset (1,560 samples), and the validation dataset using 20% (390 samples). The dependent variables of the experimental dataset included twelve technical indicators popularly been used in the previous studies. They include Stochastic %K, Stochastic %D, Momentum, ROC(rate of change), LW %R(Larry William's %R), A/D oscillator(accumulation/distribution oscillator), OSCP(price oscillator), CCI(commodity channel index), and so on. To confirm the superiority of CNN-FG, we compared its prediction accuracy with the ones of other classification models. Experimental results showed that CNN-FG outperforms LOGIT(logistic regression), ANN(artificial neural network), and SVM(support vector machine) with the statistical significance. These empirical results imply that converting time series business data into graphs and building CNN-based classification models using these graphs can be effective from the perspective of prediction accuracy. Thus, this paper sheds a light on how to apply deep learning techniques to the domain of business problem solving.

Improvement in facies discrimination using multiple seismic attributes for permeability modelling of the Athabasca Oil Sands, Canada (캐나다 Athabasca 오일샌드의 투수도 모델링을 위한 다양한 탄성파 속성들을 이용한 상 구분 향상)

  • Kashihara, Koji;Tsuji, Takashi
    • Geophysics and Geophysical Exploration
    • /
    • v.13 no.1
    • /
    • pp.80-87
    • /
    • 2010
  • This study was conducted to develop a reservoir modelling workflow to reproduce the heterogeneous distribution of effective permeability that impacts on the performance of SAGD (Steam Assisted Gravity Drainage), the in-situ bitumen recovery technique in the Athabasca Oil Sands. Lithologic facies distribution is the main cause of the heterogeneity in bitumen reservoirs in the study area. The target formation consists of sand with mudstone facies in a fluvial-to-estuary channel system, where the mudstone interrupts fluid flow and reduces effective permeability. In this study, the lithologic facies is classified into three classes having different characteristics of effective permeability, depending on the shapes of mudstones. The reservoir modelling workflow of this study consists of two main modules; facies modelling and permeability modelling. The facies modelling provides an identification of the three lithologic facies, using a stochastic approach, which mainly control the effective permeability. The permeability modelling populates mudstone volume fraction first, then transforms it into effective permeability. A series of flow simulations applied to mini-models of the lithologic facies obtains the transformation functions of the mudstone volume fraction into the effective permeability. Seismic data contribute to the facies modelling via providing prior probability of facies, which is incorporated in the facies models by geostatistical techniques. In particular, this study employs a probabilistic neural network utilising multiple seismic attributes in facies prediction that improves the prior probability of facies. The result of using the improved prior probability in facies modelling is compared to the conventional method using a single seismic attribute to demonstrate the improvement in the facies discrimination. Using P-wave velocity in combination with density in the multiple seismic attributes is the essence of the improved facies discrimination. This paper also discusses sand matrix porosity that makes P-wave velocity differ between the different facies in the study area, where the sand matrix porosity is uniquely evaluated using log-derived porosity, P-wave velocity and photographically-predicted mudstone volume.

Assessment of CO2 Geological Storage Capacity for Basalt Flow Structure around PZ-1 Exploration Well in the Southern Continental Shelf of Korea (남해 대륙붕 PZ-1 시추공 주변 현무암 대지 구조의 CO2 지중저장용량 평가)

  • Shin, Seung Yong;Kang, Moohee;Shinn, Young Jae;Cheong, Snons
    • Economic and Environmental Geology
    • /
    • v.53 no.1
    • /
    • pp.33-43
    • /
    • 2020
  • CO2 geological storage is currently considered as the most stable and effective technology for greenhouse gas reduction. The saline formations for CO2 geological storage are generally located at a depth of more than 800 m where CO2 can be stored in a supercritical state, and an extensive impermeable cap rock that prevents CO2 leakage to the surface should be distributed above the saline formations. Trough analysis of seismic and well data, we identified the basalt flow structure for potential CO2 storage where saline formation is overlain by basalt cap rock around PZ-1 exploration well in the Southern Continental Shelf of Korea. To evaluate CO2 storage capacity of the saline formation, total porosity and CO2 density are calculated based on well logging data of PZ-1 well. We constructed a 3D geological grid model with a certain size in the x, y and z axis directions for volume estimates of the saline formation, and performed a property modeling to assign total porosity to the geological grid. The estimated average CO2 geological storage capacity evaluated by the U.S. DOE method for the saline formation covered by the basalt cap rock is 84.17 Mt of CO2(ranges from 42.07 to 143.79 Mt of CO2).

Analysis on the Snow Cover Variations at Mt. Kilimanjaro Using Landsat Satellite Images (Landsat 위성영상을 이용한 킬리만자로 만년설 변화 분석)

  • Park, Sung-Hwan;Lee, Moung-Jin;Jung, Hyung-Sup
    • Korean Journal of Remote Sensing
    • /
    • v.28 no.4
    • /
    • pp.409-420
    • /
    • 2012
  • Since the Industrial Revolution, CO2 levels have been increasing with climate change. In this study, Analyze time-series changes in snow cover quantitatively and predict the vanishing point of snow cover statistically using remote sensing. The study area is Mt. Kilimanjaro, Tanzania. 23 image data of Landsat-5 TM and Landsat-7 ETM+, spanning the 27 years from June 1984 to July 2011, were acquired. For this study, first, atmospheric correction was performed on each image using the COST atmospheric correction model. Second, the snow cover area was extracted using the NDSI (Normalized Difference Snow Index) algorithm. Third, the minimum height of snow cover was determined using SRTM DEM. Finally, the vanishing point of snow cover was predicted using the trend line of a linear function. Analysis was divided using a total of 23 images and 17 images during the dry season. Results show that snow cover area decreased by approximately $6.47km^2$ from $9.01km^2$ to $2.54km^2$, equivalent to a 73% reduction. The minimum height of snow cover increased by approximately 290 m, from 4,603 m to 4,893 m. Using the trend line result shows that the snow cover area decreased by approximately $0.342km^2$ in the dry season and $0.421km^2$ overall each year. In contrast, the annual increase in the minimum height of snow cover was approximately 9.848 m in the dry season and 11.251 m overall. Based on this analysis of vanishing point, there will be no snow cover 2020 at 95% confidence interval. This study can be used to monitor global climate change by providing the change in snow cover area and reference data when studying this area or similar areas in future research.

Optimization of Microwave Extraction Conditions for Antioxidant Phenolic Compounds from Ligustrum lucidum Aiton Using Response Surface Methodology (반응표면분석법을 이용한 여정자의 페놀계 항산화 성분에 대한 마이크로웨이브 추출조건 최적화)

  • Yun, Sat-Byul;Lee, Yuri;Lee, Nam Keun;Jeong, Eung-Jeong;Jeong, Yong-Seob
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.43 no.4
    • /
    • pp.570-576
    • /
    • 2014
  • Response surface methodology (RSM) was applied to optimize the microwave-assisted extraction (MAE) conditions for electron-donating ability, total phenol content, and total flavonoid content of Ligustrum lucidum Aiton. Ligustrum lucidum Aiton from different regions was tested, and Ligustrum lucidum Aiton from Haenam was chosen due to its higher total phenolic content, total flavonoid content, DPPH radical scavenging activity and ABTS radical scavenging activity compared to the other samples. Central composite design was used to optimize extraction of Ligustrum lucidum Aiton from Haenam as well as determine the effects of extraction temperature ($X_1$) and extraction time ($X_2$) on dependent variables ($Y_n$). Determination coefficients ($R^2$) of the regression equations for dependent variables ranged from 0.8858 to 0.9517. The optimum points were $131.68^{\circ}C$ for extraction temperature and 5.49 min for extraction time. Predicted values of the optimized conditions were acceptable when compared to experimental values.

The Analysis of Successional Trends by Topographic Positions in the Natural Deciduous Forest of Mt. Chumbong (점봉산(點鳳産) 일대 천연활엽수림(天然闊葉樹林)의 지형적(地形的) 위치(位置)에 따른 천이(遷移) 경향(傾向) 분석(分析))

  • Lee, Won Sup;Kim, Ji Hong;Jin, Guang Ze
    • Journal of Korean Society of Forest Science
    • /
    • v.89 no.5
    • /
    • pp.655-665
    • /
    • 2000
  • Taking account of the structural variation on species composition by topography, the successional trends were comparatively analyzed for the three topographic positions (valley, mid-slope, and ridge) in the natural deciduous forest of Mt. Chumbong area. The analysis was based upon the subsequent process of generation replacement by understory saplings and seedlings over the overstory trees which will be eventually fallen down. This study adopted the plot sampling method, establishing twenty $20m{\times}20m$ quadrats and collecting vegetation and site data on each different topographic position. The transition matrix model, which was modified from the mathematical theory of Markov chain, was employed to analyze the successional trends and thereafter to predict the overstory species composition in the future for each different topographic position. In valley, the simulation indicated the remarkable decrease in the proportion of species composition of present dominants Quercus mongolica and Fraxinus mandshurica from current 23% and 21% to around 4% of each at the steady state, which is predicted to take less than 200 years. On the other hand, the proportion of such species as Abies holophylla, Acer mono, Tilia amurensis, and Ulmus laciniata will increase at the steady state. In mid-slope, the result showed the remarkable decrease in the proportion of Juglans mandshurica, Kalopanax pictus, and Tilia amurensis from current 15%, 8%, and 15% to 2%, 1%, and 5%, respectively, at steady state predicted to take more than 250 years. In ridge, the current dominant Quercus mongolica was predicted to be decreased dramatically from 58% to 8% at steady state which could be achieved about 200 years. On the contrary, the proportion of Acer mono and Tilia amurensis will be increased from current 4% and 3% to more than 20% and 40%, respectively, at the steady state. Overall results suggested that the study forest is more likely seral rather than climax community. Even though a lot of variation is inevitable due to various kinds of site and vegetation development, the study forest is considered to be more than 200 years away from the steady state or climax in terms of overstory species composition.

  • PDF

Optimization of Process Variables for Insulation Coating of Conductive Particles by Response Surface Methodology (반응표면분석법을 이용한 전도성물질의 절연코팅 프로세스의 최적화)

  • Sim, Chol-Ho
    • Korean Chemical Engineering Research
    • /
    • v.54 no.1
    • /
    • pp.44-51
    • /
    • 2016
  • The powder core, conventionally fabricated from iron particles coated with insulator, showed large eddy current loss under high frequency, because of small specific resistance. To overcome the eddy current loss, the increase in the specific resistance of powder cores was needed. In this study, copper oxide coating onto electrically conductive iron particles was performed using a planetary ball mill to increase the specific resistance. Coating factors were optimized by the Response surface methodology. The independent variables were the CuO mass fraction, mill revolution number, coating time, ball size, ball mass and sample mass. The response variable was the specific resistance. The optimization of six factors by the fractional factorial design indicated that CuO mass fraction, mill revolution number, and coating time were the key factors. The levels of these three factors were selected by the three-factors full factorial design and steepest ascent method. The steepest ascent method was used to approach the optimum range for maximum specific resistance. The Box-Behnken design was finally used to analyze the response surfaces of the screened factors for further optimization. The results of the Box-Behnken design showed that the CuO mass fraction and mill revolution number were the main factors affecting the efficiency of coating process. As the CuO mass fraction increased, the specific resistance increased. In contrast, the specific resistance increased with decreasing mill revolution number. The process optimization results revealed a high agreement between the experimental and the predicted data ($Adj-R^2=0.944$). The optimized CuO mass fraction, mill revolution number, and coating time were 0.4, 200 rpm, and 15 min, respectively. The measured value of the specific resistance of the coated pellet under the optimized conditions of the maximum specific resistance was $530k{\Omega}{\cdot}cm$.

The Development of Scales on Rating College Students' Adaptability and the Analysis of Technical Quality (대학적응력 검사도구 척도 개발과 양호도 검증)

  • Kim, Soo-Yoen
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.16 no.6
    • /
    • pp.295-303
    • /
    • 2016
  • The purposes of this study are to describe the process for the instrument construction and the development of scales on rating college students' adaptability and to analyze the technical qualities of the test. The primary goal of this study is to inform students and institutions what is needed to college student's adjustment process into university and college life. The scales are tested by specialty group and statistical methods, and finally composed of 142 items, which measures 8 scales, the academic integration, the social integration into college, career identity, emotional stability, learning condition's stability, relationship with professors, satisfaction degree of educational service, satisfaction degree of college education. This study analyzed 1,959 students' responses from 4 colleges and universities. This study confirms that the scales which this study developed show high concurrent evidence with the college student's adaptability inventory for Korean university and college students based on various development process, specially rapid great change of college. The result of factor analysis shows the evidence based on internal structures of the scales. The Cronbach's ${\alpha}$ of the subscales is .965, from 742 to .937. The prediction model to determine the possibility of dropout by 7 scales is statistically significant in .05, except learning condition's stability. According to CFA Model, RMSEA= .08~.09. dependence factor variance are explained by this study's CFA model. In conclusion, this study confirms that the scales which this study developed are valid and reliable instrument for Korean university and college students to predict their adaptability to college.

Optimization of Glycosyl Aesculin Synthesis by Thermotoga neapolitana β-Glucosidase Using Response-surface Methodology (반응표면분석법을 이용한 Thermotoga neapolitana β-glucosidase의 당전이 활성을 통한 glycosyl aesculin 합성 최적화)

  • Park, Hyunsu;Park, Young-Don;Cha, Jaeho
    • Journal of Life Science
    • /
    • v.27 no.1
    • /
    • pp.38-43
    • /
    • 2017
  • Glycosyl aesculin, a potent anti-inflammatory agent, was synthesized by transglycosylation reaction, catalyzed by Thermotoga neapolitana ${\beta}-glucosidase$, with aesculin as an acceptor. The key reaction parameters were optimized using response-surface methodology (RSM) and $2{\mu}g$ of the enzyme. As shown by a statistical analysis, a second-order polynomial model fitted well to the data (p<0.05). The response surface curve for the interaction between aesculin and other parameters revealed that the aesculin concentration and reaction time were the primary factors that affected the yield of glycosyl aesculin. Among the tested factors, the optimum values for glycosyl aesculin production were as follows: aesculin concentration of 9.5 g/l, temperature of $84^{\circ}C$, reaction time of 81 min, and pH of 8.2. Under these conditions, 61.7% of glycosyl aesculin was obtained, with a predicted yield of 5.86 g/l. The maximum amount of glycosyl aesculin was 6.02 g/l. Good agreement between the predicted and experimental results confirmed the validity of the RSM. The optimization of reaction conditions by RSM resulted in a 1.6-fold increase in the production of glycosyl aesculin as compared to the yield before optimization. These results indicate that RSM can be effectively used for process optimization in the synthesis of a variety of biologically active glycosides using bacterial glycosidases.

Embedded System Reliability Measurement Use Markov Chain Model (마르코프 체인 모델을 이용한 임베디드 시스템 신뢰도 측정)

  • Kawk Dong-Gyu;Cho Yong-Yoon;Park Ho-Byung;Yoo Chea-Woo
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.433-435
    • /
    • 2005
  • 임베디드 시스템은 다수의 디바이스를 컨트롤하여 시스템의 목적을 수행한다. 최근 임베디드 시스템의 요구사항이 증가함에 따라 하나의 임베디드 소프트웨어가 컨트롤하는 디바이스의 종류가 다양해지고 수도 증가하는 추세이다. 다수의 디바이스를 가지고 있는 임베디드 시스템에서 시스템의 신뢰도는 각 디바이스의 신뢰도에 많은 영향을 받는다. 본 논문은 임베디드 시스템의 신뢰도를 측정하기 위해서 통계적 신뢰도 측정 방법 중 한 가지인 마르코프 체인을 이용한 방법을 제안한다. 마르코프 체인은 여러 분야에서 복잡한 시스템을 단순화하여 모델링하고 과거의 변화를 토대로 미래를 예측할 수 있는 방법을 제공한다. 또한 전체 시스템의 확률을 행렬로 계측할 수 있는 방법을 가지고 있어 특정 부분의 확률이 전체 시스템의 확률에 미치는 영향을 산술적으로 계산할 수 있는 장점을 가지고 있다. 본 논문에서 제안하는 임베디드 소프트웨어 마르코프 체인은 테스트 대상 소스를 분석하여 디바이스를 컨트롤하는 루틴과 에러를 핸들링하는 루틴, 일반적인 루틴으로 나누어 각각을 상태로 정의한다. 정의한 각 상태간의 전이는 통계적으로 측정한 디바이스 신뢰도를 확률로 표현한다. 마르코프 체인을 이용하여 임베디드 시스템의 신뢰도를 측정하기 위한 시스템은 소스 분석기와 신뢰도 측정기로 나누어 설계한다. 소스 분석기는 테스트 대상이 되는 소스와 디바이스 드라이버 라이블러리 테이블을 입력으로 하고 소프트웨어의 마르코프 체인을 출력으로 한다 마르코프 체인은 행렬로 표현하고 연산하여 시스템의 신뢰도를 측정한다. 제안하는 시스템의 신뢰도 측정 방법은 부분이 가지고 있는 신뢰도가 전체 신뢰도에 미치는 영향을 산술적으로 측정할 수 있어 시스템이 요구하는 신뢰도에 접근할 수 있는 방법과 근거를 제공하는 장점이 있다.소시키는 장점을 갖는다.것으로 조사되었으며 40대 이상의 연령층은 점심비용으로 더 많은 지출을 하고 있는 것으로 나타났다. 4) 끼니별 한식에 대한 선호도는 아침식사의 경우가 가장 높았으며, 이는 40대와 50대에서 높게 나타났다. 점심 식사로 가장 선호되는 음식은 중식, 일식이었으며 저녁 식사에서 가장 선호되는 메뉴는 전 연령층에서 일식, 분식류 이었으며, 한식에 대한 선택 정도는 전 연령층에서 매우 낮게 나타났다. 5) 각 연령층에서 선호하는 한식에 대한 조사에서는 된장찌개가 전 연령층에서 가장 높은 선호도를 나타내었고, 김치는 40대 이상의 선호도가 30대보다 높게 나타났으며, 흥미롭게도 30세 이하의 선호도는 30대보다 높게 나타났다. 그 외에도 떡과 죽에 대한 선호도는 전 연령층에서 낮게 조사되었다. 장아찌류의 선호도는 전 연령대에서 낮았으며 특히 30세 이하에서 매우 낮게 조사되었다. 한식의 맛에 대한 만족도 조사에서는 연령이 올라갈수록 한식의 맛에 대한 만족도는 낮아지고 있었으나, 한식의 맛에 대한 만족도가 높을수록 양과 가격에 대한 만족도는 높은 경향을 나타내었다. 전반적으로 한식에 대한 선호도는 식사 때와 식사 목적에 따라 연령대 별로 다르게 나타나고 있으나, 선호도는 성별이나 세대에 관계없이 폭 넓은 선호도를 반영하고 있으며, 이는 대학생들을 대상으로 하는 연구 등에서도 나타난바 같다. 주 5일 근무제의 확산과 초 중 고생들의 토요일 휴무와 더불어 여행과 엔터테인먼트산업은 더욱 더 발전을 거듭하고 있으며, 외식은 여행과 여가 활동의 필수적인 요소로써 그 역할을 일조하고 있다. 이와 같은 여가시간의 증가는 독신자들에게는 좀더 많은 여유시간을 가족을 이루고 있는 가족구성원들에게는 가족과의 유대를 강화하는 휴식과 오락의 소비 트렌드를 창출시켰

  • PDF