• Title/Summary/Keyword: vectors

Search Result 3,860, Processing Time 0.03 seconds

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.

PS-341-Induced Apoptosis is Related to JNK-Dependent Caspase 3 Activation and It is Negatively Regulated by PI3K/Akt-Mediated Inactivation of Glycogen Synthase Kinase-$3{\beta}$ in Lung Cancer Cells (폐암세포주에서 PS-341에 의한 아포프토시스에서 JNK와 GSK-$3{\beta}$의 역할 및 상호관련성)

  • Lee, Kyoung-Hee;Lee, Choon-Taek;Kim, Young Whan;Han, Sung Koo;Shim, Young-Soo;Yoo, Chul-Gyu
    • Tuberculosis and Respiratory Diseases
    • /
    • v.57 no.5
    • /
    • pp.449-460
    • /
    • 2004
  • Background : PS-341 is a novel, highly selective and potent proteasome inhibitor, which showed cytotoxicity against some tumor cells. Its anti-tumor activity has been suggested to be associated with modulation of the expression of apoptosis-associated proteins, such as p53, $p21^{WAF/CIP1}$, $p27^{KIP1}$, NF-${\kappa}B$, Bax and Bcl-2. c-Jun N-terminal kinase (JNK) and glycogen synthase kinase-$3{\beta}$ (GSK-$3{\beta}$) are important modulators of apoptosis. However, their role in PS-341-induced apoptosis is unclear. This study was undertaken to elucidate the role of JNK and GSK-$3{\beta}$ in the PS-341-induced apoptosis in lung cancer cells. Method : NCI-H157 and A549 cells were used in the experiments. The cell viability was assayed using the MTT assay and apoptosis was evaluated by proteolysis of PARP. The JNK activity was measured by an in vitro immuno complex kinase assay and by phosphorylation of endogenous c-Jun. The protein expression was evaluated by Western blot analysis. Dominant negative JNK1 (DN-JNK1) and GSK-$3{\beta}$ were overexpressed using plasmid and adenovirus vectors, respectively. Result : PS-341 reduced the cell viability via apoptosis, activated JNK and increased the c-Jun expression. Blocking of the JNK activation by overexpression of DN-JNK1, or pretreatment with SP600125, suppressed the apoptosis induced by PS-341. The activation of caspase 3 was mediated by JNK activation. Blocking of the caspase 3 activation suppressed PS-341-induced apoptosis. PS-341 activated the phosphatidylinositol 3-kinase (PI3K)/Akt pathway, but its blockade enhanced the PS-341-induced cell death via apoptosis. GSK-$3{\beta}$ was inactivated by PS-341 via the PI3K/Akt pathway. Overexpression of constitutively active GSK-$3{\beta}$ enhanced PS-341-induced apoptosis; in contrast, this was suppressed by dominant negative GSK-$3{\beta}$ (DN-GSK-$3{\beta}$). Inactivation of GSK-$3{\beta}$ by pretreatment with lithium chloride or the overexpression of DN-GSK-$3{\beta}$ suppressed both the JNK activation and c-Jun up-regulation induced by PS-341. Conclusion : The JNK/caspase pathway is involved in PS-341-induced apoptosis, which is negatively regulated by the PI3K/Akt-mediated inactivation of GSK-$3{\beta}$ in lung cancer cells.

An Intelligence Support System Research on KTX Rolling Stock Failure Using Case-based Reasoning and Text Mining (사례기반추론과 텍스트마이닝 기법을 활용한 KTX 차량고장 지능형 조치지원시스템 연구)

  • Lee, Hyung Il;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.47-73
    • /
    • 2020
  • KTX rolling stocks are a system consisting of several machines, electrical devices, and components. The maintenance of the rolling stocks requires considerable expertise and experience of maintenance workers. In the event of a rolling stock failure, the knowledge and experience of the maintainer will result in a difference in the quality of the time and work to solve the problem. So, the resulting availability of the vehicle will vary. Although problem solving is generally based on fault manuals, experienced and skilled professionals can quickly diagnose and take actions by applying personal know-how. Since this knowledge exists in a tacit form, it is difficult to pass it on completely to a successor, and there have been studies that have developed a case-based rolling stock expert system to turn it into a data-driven one. Nonetheless, research on the most commonly used KTX rolling stock on the main-line or the development of a system that extracts text meanings and searches for similar cases is still lacking. Therefore, this study proposes an intelligence supporting system that provides an action guide for emerging failures by using the know-how of these rolling stocks maintenance experts as an example of problem solving. For this purpose, the case base was constructed by collecting the rolling stocks failure data generated from 2015 to 2017, and the integrated dictionary was constructed separately through the case base to include the essential terminology and failure codes in consideration of the specialty of the railway rolling stock sector. Based on a deployed case base, a new failure was retrieved from past cases and the top three most similar failure cases were extracted to propose the actual actions of these cases as a diagnostic guide. In this study, various dimensionality reduction measures were applied to calculate similarity by taking into account the meaningful relationship of failure details in order to compensate for the limitations of the method of searching cases by keyword matching in rolling stock failure expert system studies using case-based reasoning in the precedent case-based expert system studies, and their usefulness was verified through experiments. Among the various dimensionality reduction techniques, similar cases were retrieved by applying three algorithms: Non-negative Matrix Factorization(NMF), Latent Semantic Analysis(LSA), and Doc2Vec to extract the characteristics of the failure and measure the cosine distance between the vectors. The precision, recall, and F-measure methods were used to assess the performance of the proposed actions. To compare the performance of dimensionality reduction techniques, the analysis of variance confirmed that the performance differences of the five algorithms were statistically significant, with a comparison between the algorithm that randomly extracts failure cases with identical failure codes and the algorithm that applies cosine similarity directly based on words. In addition, optimal techniques were derived for practical application by verifying differences in performance depending on the number of dimensions for dimensionality reduction. The analysis showed that the performance of the cosine similarity was higher than that of the dimension using Non-negative Matrix Factorization(NMF) and Latent Semantic Analysis(LSA) and the performance of algorithm using Doc2Vec was the highest. Furthermore, in terms of dimensionality reduction techniques, the larger the number of dimensions at the appropriate level, the better the performance was found. Through this study, we confirmed the usefulness of effective methods of extracting characteristics of data and converting unstructured data when applying case-based reasoning based on which most of the attributes are texted in the special field of KTX rolling stock. Text mining is a trend where studies are being conducted for use in many areas, but studies using such text data are still lacking in an environment where there are a number of specialized terms and limited access to data, such as the one we want to use in this study. In this regard, it is significant that the study first presented an intelligent diagnostic system that suggested action by searching for a case by applying text mining techniques to extract the characteristics of the failure to complement keyword-based case searches. It is expected that this will provide implications as basic study for developing diagnostic systems that can be used immediately on the site.

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

Protoplast Fusion of Nicotiana glauca and Solanum tuberosum Using Selectable Marker Genes (표식유전자를 이용한 담배와 감자의 원형질체 융합)

  • Park, Tae-Eun;Chung, Hae-Joun
    • The Journal of Natural Sciences
    • /
    • v.4
    • /
    • pp.103-142
    • /
    • 1991
  • These studies were carried out to select somatic hybrid using selectable marker genes of Nicotiana glauca transformed by NPTII gene and Solanum tuberosum transformed by T- DNA, and to study characteristics of transformant. The results are summarized as follows. 1. Crown gall tumors and hairy roots were formed on potato tuber disc infected by A. tumefaciens Ach5 and A. rhizogenes ATCC15834. These tumors and roots could be grown on the phytohormone free media. 2. Callus formation from hairy root was prompted on the medium containing 2, 4 D 2mg/I with casein hydrolysate lg/l. 3. The survival ratio of crown gall tumor callus derived from potato increased on the medium containing the activated charcoal 0. 5-2. 0mg/I because of the preventions on the other hand, hairy roots were necrosis on the same medium. 4. Callus derived from hairy root were excellently grown for a short time by suspension culture on liquid medium containing 2, 4-D 2mg/I and casein hydrolysate lg/l. 5. The binary vector pGA643 was mobilized from E. coli MC1000 into wild type Agrobacteriurn tumefaciens Ach5, A. tumefaciens $A_4T$ and disarmed A. tuniefaciens LBA4404 using a triparental mating method with E. ccli HB1O1/pRK2013. Transconjugants were obtained on the minimal media containing tetracycline and kanamycin. pGA643 vectors were confirmed by electrophoresis on 0.7% agarose gel. 6. Kanamycin resistant calli were selected on the media supplemented with 2, 4-D 0.5mg/1 and kanamycin $100\mug$/ml after co- cultivating with tobacco stem explants and A. tumefaciens LBA4404/pGA643, and selected calli propagated on the same medium. 7. The multiple shoots were regenerated from kanamycin resistant calli on the MS medium containing BA 2mg/l. 8. Leaf segments of transformed shoot were able to grow vigorusly on the medium supplemented with high concentration of kanamycin $1000\mug$/ml. 9. Kanamycin resistant shoots were rooting and elongated on medium containing kanamycin $100\mug$/ml, but normal shoot were not. 10. For the production of protoplast from potato calli transformed by T-DNA and mesophyll tissue transformed by NPTII gene, the former was isolated in the enzyme mixture of 2.0% celluase Onozuka R-10, 1.0% dricelase, 1.0% macerozyme. and 0.5M mannitol, the latter was isolated in the enzyme mixture 1.0% Celluase Onozuka R-10, 0.3% macerozyme, and 0.7M mannitol. 11. The optimal concentrationn of mannitol in the enzyme mixture for high protoplast yield was 0.8M at both transformed tobacco mesophyll and potato callus. The viabilities of protoplast were shown above 90%, respectively. 12. Both tobacco mesophyll and potato callus protoplasts were fused by using PEG solution. Cell walls were regenerated on hormone free media supplemented with kanamycin after 5 days, and colonies were observed after 4 weeks culture.

  • PDF

Triptolide-induced Transrepression of IL-8 NF-${\kappa}B$ in Lung Epithelial Cells (폐상피세포에서 Triptolide에 의한 NF-${\kappa}B$ 의존성 IL-8 유전자 전사활성 억제기전)

  • Jee, Young-Koo;Kim, Yoon-Seup;Yun, Se-Young;Kim, Yong-Ho;Choi, Eun-Kyoung;Park, Jae-Seuk;Kim, Keu-Youl;Chea, Gi-Nam;Kwak, Sahng-June;Lee, Kye-Young
    • Tuberculosis and Respiratory Diseases
    • /
    • v.50 no.1
    • /
    • pp.52-66
    • /
    • 2001
  • Background : NF-${\kappa}B$ is the most important transcriptional factor in IL-8 gene expression. Triptolide is a new compound that recently has been shown to inhibit NF-${\kappa}B$ activation. The purpose of this study is to investigate how triptolide inhibits NF-${\kappa}B$-dependent IL-8 gene transcription in lung epithelial cells and to pilot the potential for the clinical application of triptolide in inflammatory lung diseases. Methods : A549 cells were used and triptolide was provided from Pharmagenesis Company (Palo Alto, CA). In order to examine NF-${\kappa}B$-dependent IL-8 transcriptional activity, we established stable A549 IL-8-NF-${\kappa}B$-luc. cells and performed luciferase assays. IL-8 gene expression was measured by RT-PCR and ELISA. A Western blot was done for the study of $I{\kappa}B{\alpha}$ degradation and an electromobility shift assay was done to analyze NF-${\kappa}B$ DNA binding. p65 specific transactivation was analyzed by a cotransfection study using a Gal4-p65 fusion protein expression system. To investigate the involvement of transcriptional coactivators, we perfomed a transfection study with CBP and SRC-1 expression vectors. Results : We observed that triptolide significantly suppresses NF-${\kappa}B$-dependent IL-8 transcriptional activity induced by IL-$1{\beta}$ and PMA. RT-PCR showed that triptolide represses both IL-$1{\beta}$ and PMA-induced IL-8 mRNA expression and ELISA confirmed this triptolide-mediated IL-8 suppression at the protein level. However, triptolide did not affect $I{\kappa}B{\alpha}$ degradation and NF-$_{\kappa}B$ DNA binding. In a p65-specific transactivation study, triptolide significantly suppressed Gal4-p65T Al and Gal4-p65T A2 activity suggesting that triptolide inhibits NF-${\kappa}B$ activation by inhibiting p65 transactivation. However, this triptolide-mediated inhibition of p65 transactivation was not rescued by the overexpression of CBP or SRC-1, thereby excluding the role of transcriptional coactivators. Conclusions : Triptolide is a new compound that inhibits NF-${\kappa}B$-dependent IL-8 transcriptional activation by inhibiting p65 transactivation, but not by an $I{\kappa}B{\alpha}$-dependent mechanism. This suggests that triptolide may have a therapeutic potential for inflammatory lung diseases.

  • PDF

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

A Study on Knowledge Entity Extraction Method for Individual Stocks Based on Neural Tensor Network (뉴럴 텐서 네트워크 기반 주식 개별종목 지식개체명 추출 방법에 관한 연구)

  • Yang, Yunseok;Lee, Hyun Jun;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.25-38
    • /
    • 2019
  • Selecting high-quality information that meets the interests and needs of users among the overflowing contents is becoming more important as the generation continues. In the flood of information, efforts to reflect the intention of the user in the search result better are being tried, rather than recognizing the information request as a simple string. Also, large IT companies such as Google and Microsoft focus on developing knowledge-based technologies including search engines which provide users with satisfaction and convenience. Especially, the finance is one of the fields expected to have the usefulness and potential of text data analysis because it's constantly generating new information, and the earlier the information is, the more valuable it is. Automatic knowledge extraction can be effective in areas where information flow is vast, such as financial sector, and new information continues to emerge. However, there are several practical difficulties faced by automatic knowledge extraction. First, there are difficulties in making corpus from different fields with same algorithm, and it is difficult to extract good quality triple. Second, it becomes more difficult to produce labeled text data by people if the extent and scope of knowledge increases and patterns are constantly updated. Third, performance evaluation is difficult due to the characteristics of unsupervised learning. Finally, problem definition for automatic knowledge extraction is not easy because of ambiguous conceptual characteristics of knowledge. So, in order to overcome limits described above and improve the semantic performance of stock-related information searching, this study attempts to extract the knowledge entity by using neural tensor network and evaluate the performance of them. Different from other references, the purpose of this study is to extract knowledge entity which is related to individual stock items. Various but relatively simple data processing methods are applied in the presented model to solve the problems of previous researches and to enhance the effectiveness of the model. From these processes, this study has the following three significances. First, A practical and simple automatic knowledge extraction method that can be applied. Second, the possibility of performance evaluation is presented through simple problem definition. Finally, the expressiveness of the knowledge increased by generating input data on a sentence basis without complex morphological analysis. The results of the empirical analysis and objective performance evaluation method are also presented. The empirical study to confirm the usefulness of the presented model, experts' reports about individual 30 stocks which are top 30 items based on frequency of publication from May 30, 2017 to May 21, 2018 are used. the total number of reports are 5,600, and 3,074 reports, which accounts about 55% of the total, is designated as a training set, and other 45% of reports are designated as a testing set. Before constructing the model, all reports of a training set are classified by stocks, and their entities are extracted using named entity recognition tool which is the KKMA. for each stocks, top 100 entities based on appearance frequency are selected, and become vectorized using one-hot encoding. After that, by using neural tensor network, the same number of score functions as stocks are trained. Thus, if a new entity from a testing set appears, we can try to calculate the score by putting it into every single score function, and the stock of the function with the highest score is predicted as the related item with the entity. To evaluate presented models, we confirm prediction power and determining whether the score functions are well constructed by calculating hit ratio for all reports of testing set. As a result of the empirical study, the presented model shows 69.3% hit accuracy for testing set which consists of 2,526 reports. this hit ratio is meaningfully high despite of some constraints for conducting research. Looking at the prediction performance of the model for each stocks, only 3 stocks, which are LG ELECTRONICS, KiaMtr, and Mando, show extremely low performance than average. this result maybe due to the interference effect with other similar items and generation of new knowledge. In this paper, we propose a methodology to find out key entities or their combinations which are necessary to search related information in accordance with the user's investment intention. Graph data is generated by using only the named entity recognition tool and applied to the neural tensor network without learning corpus or word vectors for the field. From the empirical test, we confirm the effectiveness of the presented model as described above. However, there also exist some limits and things to complement. Representatively, the phenomenon that the model performance is especially bad for only some stocks shows the need for further researches. Finally, through the empirical study, we confirmed that the learning method presented in this study can be used for the purpose of matching the new text information semantically with the related stocks.

Effect of Physical Training on Electrocardiographic Amplitudes and the QRS Vector (체력단련(體力鍛練)이 심전도파고(心電圖波高)와 QRS벡타에 미치는 효과(效果))

  • Yu, Wan-Sik;Hwang, Soo-Kwan;Kim, Hyeong-Jin;Choo, Young-Eun
    • The Korean Journal of Physiology
    • /
    • v.18 no.1
    • /
    • pp.51-65
    • /
    • 1984
  • In an effort to elucidate the effect of physical training on the electrocardiographic amplitudes, QRS vector, axis and QRS vector amplitude, electrocardiograms were recorded before and 1, 5 and 10 minutes after 3 minute rebounder exercise in 23 healthy male students aged between 18 and 21 years in two groups of athletes and non-athletes. ECG amplitudes were measured from lead I, $V_1$ and $V_5$ and axis and amplitudes of QRS vectors were measured from lead I and III in frontal plane, from lead $V_2$ and lead $V_6$ in horizontal plane. The results obtained are summarized as follows. ECG amplitudes: The R wave amplitude was $23.38{\pm}1.14\;mm$ in athletes which was higher than $17.91{\pm}2.00\;mm$ in non-athletes. After exercise, the difference in two groups remained significant throughout the recovery period. The S wave amplitude was increased significantly, and the T wave amplitude was decreased in both groups after exercise. The P wave amplitude was increased in both groups after exercise, and it was lower in athletes than in non-athletes. The PQ segment amplitude was zero in athletes but negative in non-athletes than in the resting state. The J point amplitude was positive in resting state and was negative after exercise in both groups. J+0.08 sec point amplitude was also lowered after exercise, and it was higher in athletes than in non-athletes. Therefore the whole ST segment was proved to be decreased after exercise. The summated amplitude of R in $V_5$ plus S in $V_1$ was $38.74{\pm}2.71\;mm$ in athletes which was higher than $32.82{\pm}2.90\;mm$ in non-athletes. After exercise, it was also significantly higher in athletes than in non-athletes. Axis of QRS vector: In frontal plane, axis of QRS vector was $62.7{\pm}7.36^{\circ}$ in athletes, it showed no significant difference between the two groups. In horizontal plane, axis of QRS vector was $-23.5{\pm}7.2^{\circ}$ in athletes which was significantly higher than $-38.8{\pm}8.2^{\circ}$ in non-athletes. After exercise, it was significantly higher than the resting state in both groups. Amplitude of QRS vector : In frontal plane, amplitude of QRS vector was $13.86{\pm}1.44\;mm$ in athletes which was significantly higher than $9.62{\pm}0.97\;mm$ in non-athletes. After exercise, it was also significantly higher in athletes than in non-athletes. In horizontal plane, amplitude of QRS vector was $19.82{\pm}2.10\;mm$ in athletes which was significantly higher than $16.90{\pm}1.39\;mm$ in non-athletes. After exercise, it was also significantly higher in athletes than in non-athletes. From the above, these results indicate that R wave amplitude in athletes was significantly higher than in non-athletes before and after exercise, and that the summated amplitude of R in $V_5$ plus S in $V_1$ in athletes was also $38.74{\pm}2.71\;mm$ suggesting a left ventricular hypertrophy We should note that the PQ segment and ST segment amplitude were higher in athletes than in non-athletes, and they were decreased with exercise in both groups. In particular, the fact that amplitudes of QRS vector in frontal plane or in horizontal plane were significantly greater in athletes than in non-athletes may be an index in evaluating athletes.

  • PDF

Studies on the Occurrence, Host Range, Transmission, and Control of Rice Stripe Disease in Korea (한국에서의 벼 줄무늬잎마름병의 발생, 피해, 기주범위, 전염 및 방제에 관한 연구)

  • Chung Bong Jo
    • Korean journal of applied entomology
    • /
    • v.13 no.4 s.21
    • /
    • pp.181-204
    • /
    • 1974
  • The study has been carried out to investigate the occurrence, damage, host range, transmission and control of rice stripe virus in Korea since 1965. 1 Disease occur「once and damage : The virus infection during the seedling stage ranged from 1.3 to $8\%$. More symptom expression was found in regrowth of clipped rice than infected intact plants, and the greater infection took place in early seasonal culture than in ordinary seasonal culture. A higher incidence of the disease was found on the rows close to the bank, and gradually decreased toward the centre of the rice paddy. Disease occurrence and plant maturity was highly correlated in that the most japonica rice types were diseased when they were inoculated within 3 to 7 leaf stage, and$50\%$, $20\%$ and no diseaseb were found if they were inoculated at 9, 11 and 13 leaf stages, respectively. Symptom expression required 7-15 days when the plants were inoculated during 3-7 leaf stages, while it was 15-30days in the plants inoculated during 9-15 leaf stages. On Tongil variety the per cent disease was relatively higher when the plants were infected within 1.5-5 leaf stages than those at 9 leaf stage, and no disease was found on the plants infected after 15 leaf stage. The disease resulted in lowered growth rates, maturity and sterility of Tongil variety although the variety is known as tolerant to the virus. 2. Host range: Thirty five species of crops, pasture grasses and weeds were tested for their susceptibility to the virus. Twenty one out of 35 species tested were found to be susceptible. and 3 of them, Cyperus amuricus Maximowics var. laxus, Purcereus sanguinolentus Nees and Eriocaulon robustius Makino, were found as new hosts of the virus. 3. Transmission: The vector of the virus, Laodelphax striatellus, produces 5 generations a year. The peak of second generation adults occurred at June 20th and those of third was at about July 30th in Suweon area. In Jinju area the peak of second generation adult proceeded the peak at Suweon by 5-7 days. The peaak of third generation adult was higher than the second at Jinju, but at Suweon the reverse was true. The occurrence of viruliferous Laodelphax striatellus was 10-15, 9, 17, 8 and about $10\%$ from overwintered nymph, 1st generation nymph, 2nd generation adult, End generation nymph and the remaining generations, respectively. More viruliferous L. striatellus were found in the southern area than in the central area of Korea. The occurrence of viruliferous L. striatellus depended on the circumstances of the year. The per cent viruliferous vectors gin 2nd and 3rd generation adult, however, was consistantly higher than that of other generations. Matings of viruliferous L. striatellus resulted in $90\%$ viruliferous progenies, and the 3rd, 4th and 5th instars of the vector had higher infectiviey than the rest of the vector stages. The virus acquisition rate of non-viruliferous L. striatellus was $7-9\%$, These viruliferous L. striatellus, however, could not transmit the virus for more than 3 serial times. The optimum temperature for the transmission of the viru3 was $25-30^{\circ}C$, while rare transmission occurred when the temperature was below $15^{\circ}C$. The per cent of L. striatellus parasitization by Haplogonatopus atratus were $5-48\%$ during the period from June to the end of August, and the maximum parasitization was $32-48\%$ at around July 10. 4. Control: 1) Cultural practices; The deeper the depth of transplanting more the disease occurrence was found. The higher infection rate, $1.5-3.5\%$, was observed during the late stages of seedling beds, and the rate became lower, $1.0-2.0\%$, in the early period of paddy field in southern area. Early transplanting resulted in more infection than early seasonal culture, and the ordinary seasonal culture showed the lowest infection. The disease also was favored by earlier transplanting even under tile ordinary seasonal culture. The higher the nitrogen fertilizer level the more the disease occurrence was found in the paddy field. 2) Resistant varieties; Tongil varieties shelved the resistant reaction to the virus in greenhouse tests. In the tests for resistance on 955 varieties most japonica types shelved susceptible reactions, while the resistant varieties were found mostly from introduced varietal groups. 3) Chemical control; Earlier applications of chemicals, Disyston and Diazinon, showed better results when the test was made 4 days after inoculation in the greenhouse even though none of the insecticides shelved the complete control of the disease. Three serial applications of chemicals on June 14, June 20 and June 28 showed bettor results than one or two applications at any other dates under field conditions.

  • PDF