• Title/Summary/Keyword: 데이터 논문

Search Result 41,256, Processing Time 0.07 seconds

Predicting stock movements based on financial news with systematic group identification (시스템적인 군집 확인과 뉴스를 이용한 주가 예측)

  • Seong, NohYoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.1-17
    • /
    • 2019
  • Because stock price forecasting is an important issue both academically and practically, research in stock price prediction has been actively conducted. The stock price forecasting research is classified into using structured data and using unstructured data. With structured data such as historical stock price and financial statements, past studies usually used technical analysis approach and fundamental analysis. In the big data era, the amount of information has rapidly increased, and the artificial intelligence methodology that can find meaning by quantifying string information, which is an unstructured data that takes up a large amount of information, has developed rapidly. With these developments, many attempts with unstructured data are being made to predict stock prices through online news by applying text mining to stock price forecasts. The stock price prediction methodology adopted in many papers is to forecast stock prices with the news of the target companies to be forecasted. However, according to previous research, not only news of a target company affects its stock price, but news of companies that are related to the company can also affect the stock price. However, finding a highly relevant company is not easy because of the market-wide impact and random signs. Thus, existing studies have found highly relevant companies based primarily on pre-determined international industry classification standards. However, according to recent research, global industry classification standard has different homogeneity within the sectors, and it leads to a limitation that forecasting stock prices by taking them all together without considering only relevant companies can adversely affect predictive performance. To overcome the limitation, we first used random matrix theory with text mining for stock prediction. Wherever the dimension of data is large, the classical limit theorems are no longer suitable, because the statistical efficiency will be reduced. Therefore, a simple correlation analysis in the financial market does not mean the true correlation. To solve the issue, we adopt random matrix theory, which is mainly used in econophysics, to remove market-wide effects and random signals and find a true correlation between companies. With the true correlation, we perform cluster analysis to find relevant companies. Also, based on the clustering analysis, we used multiple kernel learning algorithm, which is an ensemble of support vector machine to incorporate the effects of the target firm and its relevant firms simultaneously. Each kernel was assigned to predict stock prices with features of financial news of the target firm and its relevant firms. The results of this study are as follows. The results of this paper are as follows. (1) Following the existing research flow, we confirmed that it is an effective way to forecast stock prices using news from relevant companies. (2) When looking for a relevant company, looking for it in the wrong way can lower AI prediction performance. (3) The proposed approach with random matrix theory shows better performance than previous studies if cluster analysis is performed based on the true correlation by removing market-wide effects and random signals. The contribution of this study is as follows. First, this study shows that random matrix theory, which is used mainly in economic physics, can be combined with artificial intelligence to produce good methodologies. This suggests that it is important not only to develop AI algorithms but also to adopt physics theory. This extends the existing research that presented the methodology by integrating artificial intelligence with complex system theory through transfer entropy. Second, this study stressed that finding the right companies in the stock market is an important issue. This suggests that it is not only important to study artificial intelligence algorithms, but how to theoretically adjust the input values. Third, we confirmed that firms classified as Global Industrial Classification Standard (GICS) might have low relevance and suggested it is necessary to theoretically define the relevance rather than simply finding it in the GICS.

A Comparative Analysis of the Illumina Truseq Synthetic Long-read Haplotyping Sequencing Platform versus the 10X Genomics Chromium Genome Sequencing Platform for Haplotype Phasing and the Identification of Single-nucleotide variants (SNVs) in Hanwoo (Korean Native Cattle) (일루미나에서 제작된 TSLRH (Truseq Synthetic Long-Read Haplotyping)와 10X Genomics에서 제작된 The Chromium Genome 시퀀싱 플랫폼을 이용하여 생산된 한우(한국 재래 소)의 반수체형 페이징 및 단일염기서열변이 비교 분석)

  • Park, Woncheoul;Srikanth, Krishnamoorthy;Park, Jong-Eun;Shin, Donghyun;Ko, Haesu;Lim, Dajeong;Cho, In-Cheol
    • Journal of Life Science
    • /
    • v.29 no.1
    • /
    • pp.1-8
    • /
    • 2019
  • In Hanwoo cattle (Korean native cattle), there is a scarcity of comparative analysis papers using highdepth sequencing and haplotype phasing, particularly a comparative analysis of the Truseq Synthetic Long-Read Haplotyping sequencing platform serviced by Illumina (TSLRH) versus the Chromium Genome Sequencing platform serviced by 10X Genomics (10XG). DNA was extracted from the sperm of a Hanwoo breeding bull (ID: TN1505D2184/27214) provided by Hanwoo research canter and used for the generation of sequence data from both the sequencing platforms. We then identified SNVs using an appropriate analysis pipeline tailored for each platform. The TSLRH and 10XG platforms generated a total of 355,208,304 and 1,632,772,004 reads, respectively, corresponding to a Q30 (%) of 89.04% and 88.60%, respectively, of which 351,992,768(99.09%) and 1,526,641,824(93.50%) were successfully mapped. For the TSLRH and 10XG platforms, the mean depth of the sequencing was 13.04X and 74.3X, the longest phase block was 1,982,706 bp and 1,480,081 bp, the N50 phase block was 57,637 bp and 114,394 bp, the total number of SNVs identified was 4,534,989 and 8,496,813, and the total phased rate was 72.29% and 87.67%, respectively. Moreover, for each chromosome, we identified unique and common SNVs using both sequencing platforms. The number of SNVs was directly proportional to the length of the chromosome. Based on our results, we recommend the use of the 10XG platform for haplotype phasing and SNV identification, as it generated a longer N50 phase block, in addition to a higher mean depth, total number of reads, total number of SNVs, and phase rate, than the TSLRH platform.

Analysis of Skin Color Pigments from Camera RGB Signal Using Skin Pigment Absorption Spectrum (피부색소 흡수 스펙트럼을 이용한 카메라 RGB 신호의 피부색 성분 분석)

  • Kim, Jeong Yeop
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.1
    • /
    • pp.41-50
    • /
    • 2022
  • In this paper, a method to directly calculate the major elements of skin color such as melanin and hemoglobin from the RGB signal of the camera is proposed. The main elements of skin color typically measure spectral reflectance using specific equipment, and reconfigure the values at some wavelengths of the measured light. The values calculated by this method include such things as melanin index and erythema index, and require special equipment such as a spectral reflectance measuring device or a multi-spectral camera. It is difficult to find a direct calculation method for such component elements from a general digital camera, and a method of indirectly calculating the concentration of melanin and hemoglobin using independent component analysis has been proposed. This method targets a region of a certain RGB image, extracts characteristic vectors of melanin and hemoglobin, and calculates the concentration in a manner similar to that of Principal Component Analysis. The disadvantage of this method is that it is difficult to directly calculate the pixel unit because a group of pixels in a certain area is used as an input, and since the extracted feature vector is implemented by an optimization method, it tends to be calculated with a different value each time it is executed. The final calculation is determined in the form of an image representing the components of melanin and hemoglobin by converting it back to the RGB coordinate system without using the feature vector itself. In order to improve the disadvantages of this method, the proposed method is to calculate the component values of melanin and hemoglobin in a feature space rather than an RGB coordinate system using a feature vector, and calculate the spectral reflectance corresponding to the skin color using a general digital camera. Methods and methods of calculating detailed components constituting skin pigments such as melanin, oxidized hemoglobin, deoxidized hemoglobin, and carotenoid using spectral reflectance. The proposed method does not require special equipment such as a spectral reflectance measuring device or a multi-spectral camera, and unlike the existing method, direct calculation of the pixel unit is possible, and the same characteristics can be obtained even in repeated execution. The standard diviation of density for melanin and hemoglobin of proposed method was 15% compared to conventional and therefore gives 6 times stable.

Comparison of Ultrasound Image Quality using Edge Enhancement Mask (경계면 강조 마스크를 이용한 초음파 영상 화질 비교)

  • Jung-Min, Son;Jun-Haeng, Lee
    • Journal of the Korean Society of Radiology
    • /
    • v.17 no.1
    • /
    • pp.157-165
    • /
    • 2023
  • Ultrasound imaging uses sound waves of frequencies to cause physical actions such as reflection, absorption, refraction, and transmission at the edge between different tissues. Improvement is needed because there is a lot of noise due to the characteristics of the data generated from the ultrasound equipment, and it is difficult to grasp the shape of the tissue to be actually observed because the edge is vague. The edge enhancement method is used as a method to solve the case where the edge surface looks clumped due to a decrease in image quality. In this paper, as a method to strengthen the interface, the quality improvement was confirmed by strengthening the interface, which is the high-frequency part, in each image using an unsharpening mask and high boost. The mask filtering used for each image was evaluated by measuring PSNR and SNR. Abdominal, head, heart, liver, kidney, breast, and fetal images were obtained from Philips epiq5g and affiniti70g and Alpinion E-cube 15 ultrasound equipment. The program used to implement the algorithm was implemented with MATLAB R2022a of MathWorks. The unsharpening and high-boost mask array size was set to 3*3, and the laplacian filter, a spatial filter used to create outline-enhanced images, was applied equally to both masks. ImageJ program was used for quantitative evaluation of image quality. As a result of applying the mask filter to various ultrasound images, the subjective image quality showed that the overall contour lines of the image were clearly visible when unsharpening and high-boost mask were applied to the original image. When comparing the quantitative image quality, the image quality of the image to which the unsharpening mask and the high boost mask were applied was evaluated higher than that of the original image. In the portal vein, head, gallbladder, and kidney images, the SNR, PSNR, RMSE and MAE of the image to which the high-boost mask was applied were measured to be high. Conversely, for images of the heart, breast, and fetus, SNR, PSNR, RMSE and MAE values were measured as images with the unsharpening mask applied. It is thought that using the optimal mask according to the image will help to improve the image quality, and the contour information was provided to improve the image quality.

A New way of the Measuring of Innovative Growth: Growth Accounting Model vs Schumpeterian Technological Change Model (혁신성장 측정에 관한 연구: 성장회계모형 vs 슘페테리안 기술변화 모형)

  • Myung-Joong Kwon;Sang-Hyuk Cho;Mikyung Yun
    • Journal of Technology Innovation
    • /
    • v.31 no.1
    • /
    • pp.105-148
    • /
    • 2023
  • This paper provides a new method of measuring the degree of technological progress which contributes to real economic growth based on Schumpeter's Trilogy. Using Microdata of Statistics Korea, the results of measuring and comparing the actual growth contribution of technological progress during the period 2003-2018 by the total factor productivity growth rate(growth accounting method), the R&D investment contribution rate, and the Schumpeterian innovation growth rate, respectively are as follows. First, the measurement of the real growth contribution of technological progress by the growth rate of total factor productivity and the growth rate of Schumpeterian innovation shows contradictory results. Second, when the growth rate of production is in a decreasing trend, the difference between the growth rate of production and the growth rate of total factor productivity increases compared to when it is in an increasing trend. Conversely, when there is an increasing trend, the difference between the growth rate of production and the growth rate of total factor productivity becomes smaller compared to when it is in a decreasing trend.. Third, the technological opportunity that affects the innovation growth rate, i.e., the contribution of R&D incentives to innovative growth is only 3.3%. The reason why this result is different from the existing perception of the contribution of technological progress to growth is that different entities are being measured while measuring the same term of technological progress. Therefore, the growth rate of total factor productivity should be used to measure macroeconomic efficiency, R&D investment should be used to measure the effectiveness of new technology supply, and the Schumpeterian innovation rate should be used to measure the economic impact of technological progress. The policy implications of the research results of this thesis are as follows: ① Transition from a policy of one-sided technology supply to a policy of convergence of technology supply and new technology demand support, ② Mission-oriented R&D policy and R&D policy that links national R&D with private R&D, ③ Reclassification of capital goods reflecting the degree of new knowledge.

Behavior of Truss Railway Bridge Using Periodic Static and Dynamic Load Tests (주행 열차의 정적 및 동적 재하시험 계측 데이터를 이용한 트러스 철도 교량의 주기적 거동 분석)

  • Jin-Mo Kim;Geonwoo Kim;Si-Hyeong Kim;Dohyeong Kim;Dookie Kim
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.27 no.6
    • /
    • pp.120-129
    • /
    • 2023
  • To evaluate the vertical loads on railway bridges, conventional load tests are typically conducted. However, these tests often entail significant costs and procedural challenges. Railway conditions involve nearly identical load profiles due to standardized rail systems, which may appear straightforward in terms of load conditions. Nevertheless, this study aims to validate load tests conducted under operational train conditions by comparing the results with those obtained from conventional load tests. Additionally, static and dynamic structural behaviors are extracted from the measurement data for evaluation. To ensure the reliability of load testing, this research demonstrates feasibility through comparisons of existing measurement data with sensor attachment locations, train speeds, responses between different rail lines, tendency analysis, selection of impact coefficients, and analysis of natural frequencies. This study applies to the Dongho Railway Bridge and verifies the applicability of the proposed method. Ten operational trains and 44 sensors were deployed on the bridge to measure deformations and deflections during load test intervals, which were then compared with theoretical values. The analysis results indicate good symmetry and overlap of loads, as well as a favorable comparison between static and dynamic load test results. The maximum measured impact coefficient (0.092) was found to be lower than the theoretical impact coefficient (0.327), and the impact influence from live loads was deemed acceptable. The measured natural frequencies approximated the theoretical values, with an average of 2.393Hz compared to the calculated value of 2.415Hz. Based on these results, this paper demonstrates that for evaluating vertical loads, it is possible to measure deformations and deflections of truss railway bridges through load tests under operational train conditions without traffic control, enabling the calculation of response factors for stress adjustments.

Venture Capital Investment and the Performance of Newly Listed Firms on KOSDAQ (벤처캐피탈 투자에 따른 코스닥 상장기업의 상장실적 및 경영성과 분석)

  • Shin, Hyeran;Han, Ingoo;Joo, Jihwan
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.17 no.2
    • /
    • pp.33-51
    • /
    • 2022
  • This study analyzes newly listed companies on KOSDAQ from 2011 to 2020 for both firms having experience in attracting venture investment before listing (VI) and those without having experience in attracting venture investment (NVI) by examining differences between two groups (VI and NVI) with respect to both the level of listing performance and that of firm performance (growth) after the listing. This paper conducts descriptive statistics, mean difference, and multiple regression analysis. Independent variables for regression models include VC investment, firm age at the time of listing, firm type, firm location, firm size, the age of VC, the level of expertise of VC, and the level of fitness of VC with investment company. Throughout this paper, results suggest that listing performance and post-listed growth are better for VI than NVI. VC investment shows a negative effect on the listing period and a positive effect on the sales growth rate. Also, the amount of VC investment has negative effects on the listing period and positive effects on the market capitalization at the time of IPO and on sales growth among growth indicators. Our evidence also implies a significantly positive effect on growth after listing for firms which belong to R&D specialized industries. In addition, it is statistically significant for several years that the firm age has a positive effect on the market capitalization growth rate. This shows that market seems to put the utmost importance on a long-term stability of management capability. Finally, among the VC characteristics such as the age of VC, the level of expertise of VC, and the level of fitness of VC with investment company, we point out that a higher market capitalization tends to be observed at the time of IPO when the level of expertise of anchor VC is high. Our paper differs from prior research in that we reexamine the venture ecosystem under the outbreak of coronavirus disease 2019 which stimulates the degradation of the business environment. In addition, we introduce more effective variables such as VC investment amount when examining the effect of firm type. It enables us to indirectly evaluate the validity of technology exception policy. Although our findings suggest that related policies such as the technology special listing system or the injection of funds into the venture ecosystem are still helpful, those related systems should be updated in a more timely fashion in order to support growth power of firms due to the rapid technological development. Furthermore, industry specialization is essential to achieve regional development, and the growth of the recovery market is also urgent.

Application of MicroPACS Using the Open Source (Open Source를 이용한 MicroPACS의 구성과 활용)

  • You, Yeon-Wook;Kim, Yong-Keun;Kim, Yeong-Seok;Won, Woo-Jae;Kim, Tae-Sung;Kim, Seok-Ki
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.13 no.1
    • /
    • pp.51-56
    • /
    • 2009
  • Purpose: Recently, most hospitals are introducing the PACS system and use of the system continues to expand. But small-scaled PACS called MicroPACS has already been in use through open source programs. The aim of this study is to prove utility of operating a MicroPACS, as a substitute back-up device for conventional storage media like CDs and DVDs, in addition to the full-PACS already in use. This study contains the way of setting up a MicroPACS with open source programs and assessment of its storage capability, stability, compatibility and performance of operations such as "retrieve", "query". Materials and Methods: 1. To start with, we searched open source software to correspond with the following standards to establish MicroPACS, (1) It must be available in Windows Operating System. (2) It must be free ware. (3) It must be compatible with PET/CT scanner. (4) It must be easy to use. (5) It must not be limited of storage capacity. (6) It must have DICOM supporting. 2. (1) To evaluate availability of data storage, we compared the time spent to back up data in the open source software with the optical discs (CDs and DVD-RAMs), and we also compared the time needed to retrieve data with the system and with optical discs respectively. (2) To estimate work efficiency, we measured the time spent to find data in CDs, DVD-RAMs and MicroPACS. 7 technologists participated in this study. 3. In order to evaluate stability of the software, we examined whether there is a data loss during the system is maintained for a year. Comparison object; How many errors occurred in randomly selected data of 500 CDs. Result: 1. We chose the Conquest DICOM Server among 11 open source software used MySQL as a database management system. 2. (1) Comparison of back up and retrieval time (min) showed the result of the following: DVD-RAM (5.13,2.26)/Conquest DICOM Server (1.49,1.19) by GE DSTE (p<0.001), CD (6.12,3.61)/Conquest (0.82,2.23) by GE DLS (p<0.001), CD (5.88,3.25)/Conquest (1.05,2.06) by SIEMENS. (2) The wasted time (sec) to find some data is as follows: CD ($156{\pm}46$), DVD-RAM ($115{\pm}21$) and Conquest DICOM Server ($13{\pm}6$). 3. There was no data loss (0%) for a year and it was stored 12741 PET/CT studies in 1.81 TB memory. In case of CDs, On the other hand, 14 errors among 500 CDs (2.8%) is generated. Conclusions: We found that MicroPACS could be set up with the open source software and its performance was excellent. The system built with open source proved more efficient and more robust than back-up process using CDs or DVD-RAMs. We believe that the operation of the MicroPACS would be effective data storage device as long as its operators develop and systematize it.

  • PDF

Ensemble of Nested Dichotomies for Activity Recognition Using Accelerometer Data on Smartphone (Ensemble of Nested Dichotomies 기법을 이용한 스마트폰 가속도 센서 데이터 기반의 동작 인지)

  • Ha, Eu Tteum;Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.123-132
    • /
    • 2013
  • As the smartphones are equipped with various sensors such as the accelerometer, GPS, gravity sensor, gyros, ambient light sensor, proximity sensor, and so on, there have been many research works on making use of these sensors to create valuable applications. Human activity recognition is one such application that is motivated by various welfare applications such as the support for the elderly, measurement of calorie consumption, analysis of lifestyles, analysis of exercise patterns, and so on. One of the challenges faced when using the smartphone sensors for activity recognition is that the number of sensors used should be minimized to save the battery power. When the number of sensors used are restricted, it is difficult to realize a highly accurate activity recognizer or a classifier because it is hard to distinguish between subtly different activities relying on only limited information. The difficulty gets especially severe when the number of different activity classes to be distinguished is very large. In this paper, we show that a fairly accurate classifier can be built that can distinguish ten different activities by using only a single sensor data, i.e., the smartphone accelerometer data. The approach that we take to dealing with this ten-class problem is to use the ensemble of nested dichotomy (END) method that transforms a multi-class problem into multiple two-class problems. END builds a committee of binary classifiers in a nested fashion using a binary tree. At the root of the binary tree, the set of all the classes are split into two subsets of classes by using a binary classifier. At a child node of the tree, a subset of classes is again split into two smaller subsets by using another binary classifier. Continuing in this way, we can obtain a binary tree where each leaf node contains a single class. This binary tree can be viewed as a nested dichotomy that can make multi-class predictions. Depending on how a set of classes are split into two subsets at each node, the final tree that we obtain can be different. Since there can be some classes that are correlated, a particular tree may perform better than the others. However, we can hardly identify the best tree without deep domain knowledge. The END method copes with this problem by building multiple dichotomy trees randomly during learning, and then combining the predictions made by each tree during classification. The END method is generally known to perform well even when the base learner is unable to model complex decision boundaries As the base classifier at each node of the dichotomy, we have used another ensemble classifier called the random forest. A random forest is built by repeatedly generating a decision tree each time with a different random subset of features using a bootstrap sample. By combining bagging with random feature subset selection, a random forest enjoys the advantage of having more diverse ensemble members than a simple bagging. As an overall result, our ensemble of nested dichotomy can actually be seen as a committee of committees of decision trees that can deal with a multi-class problem with high accuracy. The ten classes of activities that we distinguish in this paper are 'Sitting', 'Standing', 'Walking', 'Running', 'Walking Uphill', 'Walking Downhill', 'Running Uphill', 'Running Downhill', 'Falling', and 'Hobbling'. The features used for classifying these activities include not only the magnitude of acceleration vector at each time point but also the maximum, the minimum, and the standard deviation of vector magnitude within a time window of the last 2 seconds, etc. For experiments to compare the performance of END with those of other methods, the accelerometer data has been collected at every 0.1 second for 2 minutes for each activity from 5 volunteers. Among these 5,900 ($=5{\times}(60{\times}2-2)/0.1$) data collected for each activity (the data for the first 2 seconds are trashed because they do not have time window data), 4,700 have been used for training and the rest for testing. Although 'Walking Uphill' is often confused with some other similar activities, END has been found to classify all of the ten activities with a fairly high accuracy of 98.4%. On the other hand, the accuracies achieved by a decision tree, a k-nearest neighbor, and a one-versus-rest support vector machine have been observed as 97.6%, 96.5%, and 97.6%, respectively.

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.