• Title/Summary/Keyword: word dictionary

Search Result 276, Processing Time 0.022 seconds

Korean Sentence Generation Using Phoneme-Level LSTM Language Model (한국어 음소 단위 LSTM 언어모델을 이용한 문장 생성)

  • Ahn, SungMahn;Chung, Yeojin;Lee, Jaejoon;Yang, Jiheon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.71-88
    • /
    • 2017
  • Language models were originally developed for speech recognition and language processing. Using a set of example sentences, a language model predicts the next word or character based on sequential input data. N-gram models have been widely used but this model cannot model the correlation between the input units efficiently since it is a probabilistic model which are based on the frequency of each unit in the training set. Recently, as the deep learning algorithm has been developed, a recurrent neural network (RNN) model and a long short-term memory (LSTM) model have been widely used for the neural language model (Ahn, 2016; Kim et al., 2016; Lee et al., 2016). These models can reflect dependency between the objects that are entered sequentially into the model (Gers and Schmidhuber, 2001; Mikolov et al., 2010; Sundermeyer et al., 2012). In order to learning the neural language model, texts need to be decomposed into words or morphemes. Since, however, a training set of sentences includes a huge number of words or morphemes in general, the size of dictionary is very large and so it increases model complexity. In addition, word-level or morpheme-level models are able to generate vocabularies only which are contained in the training set. Furthermore, with highly morphological languages such as Turkish, Hungarian, Russian, Finnish or Korean, morpheme analyzers have more chance to cause errors in decomposition process (Lankinen et al., 2016). Therefore, this paper proposes a phoneme-level language model for Korean language based on LSTM models. A phoneme such as a vowel or a consonant is the smallest unit that comprises Korean texts. We construct the language model using three or four LSTM layers. Each model was trained using Stochastic Gradient Algorithm and more advanced optimization algorithms such as Adagrad, RMSprop, Adadelta, Adam, Adamax, and Nadam. Simulation study was done with Old Testament texts using a deep learning package Keras based the Theano. After pre-processing the texts, the dataset included 74 of unique characters including vowels, consonants, and punctuation marks. Then we constructed an input vector with 20 consecutive characters and an output with a following 21st character. Finally, total 1,023,411 sets of input-output vectors were included in the dataset and we divided them into training, validation, testsets with proportion 70:15:15. All the simulation were conducted on a system equipped with an Intel Xeon CPU (16 cores) and a NVIDIA GeForce GTX 1080 GPU. We compared the loss function evaluated for the validation set, the perplexity evaluated for the test set, and the time to be taken for training each model. As a result, all the optimization algorithms but the stochastic gradient algorithm showed similar validation loss and perplexity, which are clearly superior to those of the stochastic gradient algorithm. The stochastic gradient algorithm took the longest time to be trained for both 3- and 4-LSTM models. On average, the 4-LSTM layer model took 69% longer training time than the 3-LSTM layer model. However, the validation loss and perplexity were not improved significantly or became even worse for specific conditions. On the other hand, when comparing the automatically generated sentences, the 4-LSTM layer model tended to generate the sentences which are closer to the natural language than the 3-LSTM model. Although there were slight differences in the completeness of the generated sentences between the models, the sentence generation performance was quite satisfactory in any simulation conditions: they generated only legitimate Korean letters and the use of postposition and the conjugation of verbs were almost perfect in the sense of grammar. The results of this study are expected to be widely used for the processing of Korean language in the field of language processing and speech recognition, which are the basis of artificial intelligence systems.

Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

  • Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.143-156
    • /
    • 2012
  • People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.

Playing with Rauschenberg: Re-reading Rebus (라우센버그와 게임하기-<리버스> 다시읽기)

  • Rhee, Ji-Eun
    • The Journal of Art Theory & Practice
    • /
    • no.2
    • /
    • pp.27-48
    • /
    • 2004
  • Robert Rauschenberg's artistic career has often been regarded as having reached its culmination when the artist won the first prize at the 1964 Venice Biennale. With this victory, Rauschenberg triumphantly entered the pantheon of all-American artists and firmly secured his position in the history of American art. On the other hand, despite the artist's ongoing new experiments in his art, the seemingly precocious ripeness in his career has led the critical discourses on Rauschenberg's art to the artist's early works, most of which were done in the mid-1950s and the 1960s. The crux of Rauschenberg criticism lies not only in focusing on the artist's 50's and 60's works, but also in its large dismissal of the significance of the imagery that the artist employed in his works. As art historians Roger Cranshaw and Adrian Lewis point out, the critical discourse of Rauschenberg either focuses on the formalist concerns on the picture plane, or relies on the "culturalist" interpretation of Rauschenberg's imagery which emphasizes the artist's "Americanness." Recently, a group of art historians centered around October has applied Charles Sanders Peirce's semiotics as art historical methodology and illuminated the indexical aspects of Rauschenberg's work. The semantic inquiry into Rauschenberg's imagery has also been launched by some art historians who seek the clues in the artist's personal context. The first half of this essay will examine the previous criticism on Rauschenberg's art and the other half will discuss the artist's 1955 work Rebus, which I think intersects various critical concerns of Rauschenberg's work, and yet defies the closure of discourses in one direction. The categories of signs in the semiotics of Charles Sanders Peirce and the discourse of Jean-Francois Lyotard will be used in discussing the meanings of Rebus, not to search for the semantic readings of the work, hut to make an analogy in terms of the paradoxical structures of both the work and the theory. The definitions of rebus is as follows: Rebus 1. a representation or words or syllables by pictures of object or by symbols whose names resemble the intended words or syllables in sound; also: a riddle made up wholly or in part of such pictures or symbols. 2. a badge that suggests the name of the person to whom it belongs. Webster's Third New International Dictionary of the English Language Unabridged. Since its creation in 1955, Robert Rauschenberg's Rebus has been one of the most intriguing works in the artist's oeuvre. This monumental 'combine' painting($6feet{\times}10feet$ 10.5 inches) consists of three panels covered with fabric, paper, newspaper, and printed reproductions. On top of these, oil paints, pencil and crayon drawings connect each section into a whole. The layout of the images is overall horizontal. Starting from a torn election poster, which is partially read as "THAT REPRE," on the far left side of the painting. Rebus leads us to proceed from the left to the right, the typical direction of reading in a Western context. Along with its seemingly proper title. Rebus, the painting has triggered many art historians to seek some semantic readings of it. These art historians painstakingly reconstruct the iconography based on the artist's interviews, (auto)biography, and artistic context of his works. The interpretation of Rebus varies from a 'image-by-image' collation with a word to a more general commentary on Rauschenberg's work overall, such as a work that "bridges between art and life." Despite the title's allusion to the legitimate purpose of the painting as a decoding of the imagery into sound, Rebus, I argue, actually hinders a reading of it. By reading through Peirce to Rauschenberg, I will delve into the subtle anxiety between words and images in their works. And on this basis, I suggest Rauschenberg's strategy in playing Rebus is to hide the meaning of the imagery rather than to disclose it.

  • PDF

The Bibliographical investigation of the mallow, hollyhock, darkpull, sunflower (아욱(葵菜), 접시꽃(蜀葵), 닥풀(黃蜀葵), 해바라기(向日葵)에 대한 문헌고찰)

  • Kim, Jong-dug;Koh, Byung-hee
    • Journal of Sasang Constitutional Medicine
    • /
    • v.11 no.1
    • /
    • pp.221-240
    • /
    • 1999
  • 1. Purpose of study In the medical science of 'Sasang', a constitutional examination(diagnosis) and a medical treatment are important however a dietary cure is considered as very important at the medical prevention and treatment. But there has been a confusion due to the different view concerning the constitutional foods in between scholars. There it is necessary for us to bring up the theoretical basis of the 'Sasang' constitutional - dietary cure by means of the bibliographical study in relation to a historic, characteristics, efficiency of the major foods. A mellow as called "Baekchejiju" has been used as a source of adding food materials when we make a boiling soup, which is only in Korea but not other countries case. We also studied a hollyhock, a 'Darkpull', a sunflower together with a mellow, because these plants contains a similar characteristics and same chinese word of 'Gue' at their name. At this study we would like to bring up the basis correcting the evil of the misinterpretation to be translated 'Gue' into 'Sunflower', which would be helpful to the current academic circles studied very rarely for the introduction process of sunflower. 2. Method of study We did a comparative study based on not only 'Bonchoseo - original plants book' but also agricultural books, boos of the same kinds and private books. 3. Result of study 1) A mellow has been changed its inscribed name from 'Abushil' to 'A-uk', to 'A-ok', to 'A-uk'. And a winter mellow is called as 'Dol-a-uk' which means the thing is changed a year. 2) The heliotropism of mellow has been used as the symbol of the loyalty and the intelligence. Its meaning has been interpreted expansively engaging with the heliotropism of a hollyhock, a Darkpull, and a sunflower. 3) Once 'Darkpull' had been recognized as 'one day flower'. But after sunflower come, people have confused and misread 'Darkpull' by 'Sunflower'. 4) The first record of sunflower among the existing bibliographical documents is "Chung-jang-gam-chun-seo" (1795). And It is presumed thal the sunflower had introduced in Korea at the early to mid of the eighteen century. 5) The interpretation for mellow has been made s confusion by a several documentary and dictionary record, but should be corrected to be right.

  • PDF

A Social Economic Comparative Study on Appearance Background of Design -for Native Settlement of Design in Korea - (우리 나라 디자인 도입에 관한 사회경제사적 고찰 - 디자인의 한국적 개념의 정착을 위한 시론 -)

  • 이인자
    • Archives of design research
    • /
    • v.11
    • /
    • pp.130-139
    • /
    • 1995
  • The dictionary defines the word 'Design' as planning and designing. Though this is a meaning confined to decorative function, the conception of modern design in this capitalist society of mass production and mass consumption can be said to have reached a new stage of the meeting of industry and the arts. This means the two sides of design' the side of beauty and usefulness The side of beauty should be understood in view of the sense of beauty, and usefulness should also be considered from the viewpoint of consumer's taste and preference This is thought to be the natural problems of design The origin of design can be understood from the background of capitalism. But the capitalism can be said to be the mode of Western thought and action developed based on Western thinking. The capitalism is an economic system derived from the society of industrial capitalism through commercial capitalism. but this economic thinking has been resulted from a mature social system of democracy and civic society. The civic society and democracy are derived from polis of ancient Greece and Rome. and the ancient Greek and Roman society was a society developed from the social system of the nobility and slaves. Polis continued to develop based on the positive territorial expansionism centering around the Mediterranean on the basis of Hellenism. and European countries achieved the intergration of religion. society and politics based on this. thus accomplishing the spirit of capitalism Our design is believed to have been derived from the direct import of Western capitalism. Accordingly. as the original form of Western capitalism has become our economic system. so our design copied that of th West. And our traditional culture and sensitivity which are different in the original form and root of racial disposition seem to breed discord between them. It is. therefore. very important and meaningful for us to exert all possible efforts to seek the root of our disposition and tradition and grope for the appropriate thought and style of design.

  • PDF

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.