• Title/Summary/Keyword: Log Analysis System

Search Result 557, Processing Time 0.033 seconds

Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary (주가지수 방향성 예측을 위한 주제지향 감성사전 구축 방안)

  • Yu, Eunji;Kim, Yoosin;Kim, Namgyu;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.95-110
    • /
    • 2013
  • Recently, the amount of unstructured data being generated through a variety of social media has been increasing rapidly, resulting in the increasing need to collect, store, search for, analyze, and visualize this data. This kind of data cannot be handled appropriately by using the traditional methodologies usually used for analyzing structured data because of its vast volume and unstructured nature. In this situation, many attempts are being made to analyze unstructured data such as text files and log files through various commercial or noncommercial analytical tools. Among the various contemporary issues dealt with in the literature of unstructured text data analysis, the concepts and techniques of opinion mining have been attracting much attention from pioneer researchers and business practitioners. Opinion mining or sentiment analysis refers to a series of processes that analyze participants' opinions, sentiments, evaluations, attitudes, and emotions about selected products, services, organizations, social issues, and so on. In other words, many attempts based on various opinion mining techniques are being made to resolve complicated issues that could not have otherwise been solved by existing traditional approaches. One of the most representative attempts using the opinion mining technique may be the recent research that proposed an intelligent model for predicting the direction of the stock index. This model works mainly on the basis of opinions extracted from an overwhelming number of economic news repots. News content published on various media is obviously a traditional example of unstructured text data. Every day, a large volume of new content is created, digitalized, and subsequently distributed to us via online or offline channels. Many studies have revealed that we make better decisions on political, economic, and social issues by analyzing news and other related information. In this sense, we expect to predict the fluctuation of stock markets partly by analyzing the relationship between economic news reports and the pattern of stock prices. So far, in the literature on opinion mining, most studies including ours have utilized a sentiment dictionary to elicit sentiment polarity or sentiment value from a large number of documents. A sentiment dictionary consists of pairs of selected words and their sentiment values. Sentiment classifiers refer to the dictionary to formulate the sentiment polarity of words, sentences in a document, and the whole document. However, most traditional approaches have common limitations in that they do not consider the flexibility of sentiment polarity, that is, the sentiment polarity or sentiment value of a word is fixed and cannot be changed in a traditional sentiment dictionary. In the real world, however, the sentiment polarity of a word can vary depending on the time, situation, and purpose of the analysis. It can also be contradictory in nature. The flexibility of sentiment polarity motivated us to conduct this study. In this paper, we have stated that sentiment polarity should be assigned, not merely on the basis of the inherent meaning of a word but on the basis of its ad hoc meaning within a particular context. To implement our idea, we presented an intelligent investment decision-support model based on opinion mining that performs the scrapping and parsing of massive volumes of economic news on the web, tags sentiment words, classifies sentiment polarity of the news, and finally predicts the direction of the next day's stock index. In addition, we applied a domain-specific sentiment dictionary instead of a general purpose one to classify each piece of news as either positive or negative. For the purpose of performance evaluation, we performed intensive experiments and investigated the prediction accuracy of our model. For the experiments to predict the direction of the stock index, we gathered and analyzed 1,072 articles about stock markets published by "M" and "E" media between July 2011 and September 2011.

Present status and prospect for development of mushrooms in Korea

  • Jang, Kab-Yeul;Oh, Youn-Lee;Oh, Minji;Im, Ji-Hoon;Lee, Seul-Ki;Kong, Won-Sik
    • 한국균학회소식:학술대회논문집
    • /
    • 2018.05a
    • /
    • pp.27-27
    • /
    • 2018
  • The production scale of mushroom cultivation in Korea is approximately 600 billion won, which is 1.6% of the Korean gross agricultural output. Annually, ca. 190,000 tons of mushrooms are harvested in Korea. Although the numbers of mushroom farms and cultivators are constantly decreasing, the total mushroom yields are increasing due to the large-scale cultivation facilities and automation. The recent expansion of the well-being trend causes increase in mushroom consumption in Korea: annual per capita consumption of mushroom was 3.9kg ('13) that is a little higher than European's average. Thus the exports of mushrooms, mainly Flammulina velutipes and Pleurotus ostreatus, have been increased since the middle of 2000s. Recently, however, it is slightly reduced. However, Vietnam, Hong Kong, the United States, the Netherlands and continued to export, and the country has increased recently been exported to Australia, Canada, Southeast Asia and so on. Canned foods of Agaricus bisporus was the first exports of the Korean mushroom industry. This business has reached the peak of the sale in 1977-1978. As Korea initiated trade with China in 1980, the international prices of mushrooms were sharply fall that led to shrink the domestic markets. According to the high demand to develop new items to substitute for A. bisporus, oyster mushroom (Pleurotus ostreatus) was received the attention since it seems to suit the taste of Korean consumers. Although log cultivation technique was developed in the early 1970s for oyster mushroom, this method requires a great deal of labor. Thus we developed shelf cultivation technique which is easier to manage and allows the mass production. In this technique, the growing shelf is manly made from fermented rice straw, that is the unique P. ostreatus medium in the world, was used only in South Korea. After then, the use of cotton wastes as an additional material of medium, the productivity. Currently it is developing a standard cultivation techniques and environmental control system that can stably produce mushrooms throughout the year. The increase of oyster mushroom production may activate the domestic market and contribute to the industrial development. In addition, oyster mushroom production technology has a role in forming the basis of the development of bottle cultivation. Developed mushroom cultivation technology using bottles made possible the mass production. In particular, bottle cultivation method using a liquid spawn can be an opportunity to export the F.velutipes and P.eryngii. In addition, the white varieties of F.velutipes were second developed in the world after Japan. We also developed the new A.bisporus cultivar "Sae-ah" that is easy to grown in Korea. To lead the mushroom industry, we will continue to develop the cultivars with an international competitive power and to improve the cultivation techniques. Mushroom research in Korea nowadays focuses on analysis of mushroom genetics in combination with development of new mushroom varieties, mushroom physiology and cultivation. Further studied are environmental factors for cultivation, disease control, development and utilization of mushroom substrate resources, post-harvest management and improvement of marketable traits. Finally, the RDA manages the collection, classification, identification and preservation of mushroom resources. To keep up with the increasing application of biotechnology in agricultural research the genome project of various mushrooms and the draft of the genetic map has just been completed. A broad range of future studies based on this project is anticipated. The mushroom industry in Korea continually grows and its productivity rapidly increases through the development of new mushrooms cultivars and automated plastic bottle cultivation. Consumption of medicinal mushrooms like Ganoderma lucidum and Phellinus linteus is also increasing strongly. Recently, business of edible and medicinal mushrooms was suffering under over-production and problems in distribution. Fortunately, expansion of the mushroom export helped ease the negative effects for the mushroom industry.

  • PDF

Analysis of Meteorological Elements in the Cultivated Area of Hadong Green Tea (하동녹차 재배지역의 기상요소별 분석)

  • Hwang, Jung-Gyu;Kim, Jong-Cheol;Cho, Kyoung-Hwan;Han, Jae-Yoon;Kim, Ru-Mi;Kim, Yeon-Su;Cheong, Gang-Won;Kim, Yong-Duck
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.12 no.2
    • /
    • pp.132-142
    • /
    • 2010
  • Characteristics of meteorological elements were analyzed at Hwagae and Agyang where are the representative areas of Hadong green tea cultivation in Korea. An automatic weather monitoring system (AWS) and a simple data log were employed to measure meteorological data such as temperature, relative humidity, precipitation, and wind direction and speed for 2009. The annual average air temperature of Hwagae and Agyang was 14.5 and 14.2, respectively, showing the warmest month in August ($25.4^{\circ}C$ for Hwagae and $24.9^{\circ}C$ for Agyang) and the coldest month in January ($0.3^{\circ}C$ for Hwagae and $0.2^{\circ}C$ for Agyang). Annual average of daily temperature difference (= daily maximum temperature - daily minimum temperature) was $11.3^{\circ}C$ for Hwagae and $11.1^{\circ}C$ for Agyang. Hwagae and Agyang had 62.7% and 65.3% of the annual average relative humidity, respectively. Annual precipitation was 1387 mm for Hwagae and 1793 mm for Agyang of which were higher of 605mm for Hwagae and 835 mm for Agyang compared to that in 2008. Majority of precipitation occurred between May and August, attributing 77.6% for Hwagae and 76.6% for Agyang to the annual precipitation. The annual total sunshine duration was 2054.3 hrs in Hwagae with the longest monthly sunshine duration in May (235.1 hrs) and the shortest monthly sunshine duration in July (102.5 hrs). Dominant wind direction changed seasonally from northwesterly wind in fall and winter to southeasterly wind in spring and summer. The annual average wind speed was 1.5 m $s^{-1}$ with the highest monthly wind speed of 2.0 m $s^{-1}$ in December and the lowest monthly wind speed of 1.1 m $s^{-1}$ in February. It is expected that continuous observation and assessment of meteorological data will improve our understanding of optimal environmental conditions for green tea cultivation and be used for developing models of green tea cultivation in the Hadong area.

Open Digital Textbook for Smart Education (스마트교육을 위한 오픈 디지털교과서)

  • Koo, Young-Il;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.177-189
    • /
    • 2013
  • In Smart Education, the roles of digital textbook is very important as face-to-face media to learners. The standardization of digital textbook will promote the industrialization of digital textbook for contents providers and distributers as well as learner and instructors. In this study, the following three objectives-oriented digital textbooks are looking for ways to standardize. (1) digital textbooks should undertake the role of the media for blended learning which supports on-off classes, should be operating on common EPUB viewer without special dedicated viewer, should utilize the existing framework of the e-learning learning contents and learning management. The reason to consider the EPUB as the standard for digital textbooks is that digital textbooks don't need to specify antoher standard for the form of books, and can take advantage od industrial base with EPUB standards-rich content and distribution structure (2) digital textbooks should provide a low-cost open market service that are currently available as the standard open software (3) To provide appropriate learning feedback information to students, digital textbooks should provide a foundation which accumulates and manages all the learning activity information according to standard infrastructure for educational Big Data processing. In this study, the digital textbook in a smart education environment was referred to open digital textbook. The components of open digital textbooks service framework are (1) digital textbook terminals such as smart pad, smart TVs, smart phones, PC, etc., (2) digital textbooks platform to show and perform digital contents on digital textbook terminals, (3) learning contents repository, which exist on the cloud, maintains accredited learning, (4) App Store providing and distributing secondary learning contents and learning tools by learning contents developing companies, and (5) LMS as a learning support/management tool which on-site class teacher use for creating classroom instruction materials. In addition, locating all of the hardware and software implement a smart education service within the cloud must have take advantage of the cloud computing for efficient management and reducing expense. The open digital textbooks of smart education is consdered as providing e-book style interface of LMS to learners. In open digital textbooks, the representation of text, image, audio, video, equations, etc. is basic function. But painting, writing, problem solving, etc are beyond the capabilities of a simple e-book. The Communication of teacher-to-student, learner-to-learnert, tems-to-team is required by using the open digital textbook. To represent student demographics, portfolio information, and class information, the standard used in e-learning is desirable. To process learner tracking information about the activities of the learner for LMS(Learning Management System), open digital textbook must have the recording function and the commnincating function with LMS. DRM is a function for protecting various copyright. Currently DRMs of e-boook are controlled by the corresponding book viewer. If open digital textbook admitt DRM that is used in a variety of different DRM standards of various e-book viewer, the implementation of redundant features can be avoided. Security/privacy functions are required to protect information about the study or instruction from a third party UDL (Universal Design for Learning) is learning support function for those with disabilities have difficulty in learning courses. The open digital textbook, which is based on E-book standard EPUB 3.0, must (1) record the learning activity log information, and (2) communicate with the server to support the learning activity. While the recording function and the communication function, which is not determined on current standards, is implemented as a JavaScript and is utilized in the current EPUB 3.0 viewer, ths strategy of proposing such recording and communication functions as the next generation of e-book standard, or special standard (EPUB 3.0 for education) is needed. Future research in this study will implement open source program with the proposed open digital textbook standard and present a new educational services including Big Data analysis.

Clickstream Big Data Mining for Demographics based Digital Marketing (인구통계특성 기반 디지털 마케팅을 위한 클릭스트림 빅데이터 마이닝)

  • Park, Jiae;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.143-163
    • /
    • 2016
  • The demographics of Internet users are the most basic and important sources for target marketing or personalized advertisements on the digital marketing channels which include email, mobile, and social media. However, it gradually has become difficult to collect the demographics of Internet users because their activities are anonymous in many cases. Although the marketing department is able to get the demographics using online or offline surveys, these approaches are very expensive, long processes, and likely to include false statements. Clickstream data is the recording an Internet user leaves behind while visiting websites. As the user clicks anywhere in the webpage, the activity is logged in semi-structured website log files. Such data allows us to see what pages users visited, how long they stayed there, how often they visited, when they usually visited, which site they prefer, what keywords they used to find the site, whether they purchased any, and so forth. For such a reason, some researchers tried to guess the demographics of Internet users by using their clickstream data. They derived various independent variables likely to be correlated to the demographics. The variables include search keyword, frequency and intensity for time, day and month, variety of websites visited, text information for web pages visited, etc. The demographic attributes to predict are also diverse according to the paper, and cover gender, age, job, location, income, education, marital status, presence of children. A variety of data mining methods, such as LSA, SVM, decision tree, neural network, logistic regression, and k-nearest neighbors, were used for prediction model building. However, this research has not yet identified which data mining method is appropriate to predict each demographic variable. Moreover, it is required to review independent variables studied so far and combine them as needed, and evaluate them for building the best prediction model. The objective of this study is to choose clickstream attributes mostly likely to be correlated to the demographics from the results of previous research, and then to identify which data mining method is fitting to predict each demographic attribute. Among the demographic attributes, this paper focus on predicting gender, age, marital status, residence, and job. And from the results of previous research, 64 clickstream attributes are applied to predict the demographic attributes. The overall process of predictive model building is compose of 4 steps. In the first step, we create user profiles which include 64 clickstream attributes and 5 demographic attributes. The second step performs the dimension reduction of clickstream variables to solve the curse of dimensionality and overfitting problem. We utilize three approaches which are based on decision tree, PCA, and cluster analysis. We build alternative predictive models for each demographic variable in the third step. SVM, neural network, and logistic regression are used for modeling. The last step evaluates the alternative models in view of model accuracy and selects the best model. For the experiments, we used clickstream data which represents 5 demographics and 16,962,705 online activities for 5,000 Internet users. IBM SPSS Modeler 17.0 was used for our prediction process, and the 5-fold cross validation was conducted to enhance the reliability of our experiments. As the experimental results, we can verify that there are a specific data mining method well-suited for each demographic variable. For example, age prediction is best performed when using the decision tree based dimension reduction and neural network whereas the prediction of gender and marital status is the most accurate by applying SVM without dimension reduction. We conclude that the online behaviors of the Internet users, captured from the clickstream data analysis, could be well used to predict their demographics, thereby being utilized to the digital marketing.

Spatio-temporal Fluctuations with Influences of Inflowing Tributary Streams on Water Quality in Daecheong Reservoir (대청호의 시공간적 수질 변화 특성 및 호수내 유입지천의 영향)

  • Kim, Gyung-Hyun;Lee, Jae-Hoon;An, Kwang-Guk
    • Korean Journal of Ecology and Environment
    • /
    • v.45 no.2
    • /
    • pp.158-173
    • /
    • 2012
  • The objectives of this study were to analyze the longitudinal gradient and temporal variations of water quality in Daecheong Reservoir in relation to the major inflowing streams from the watershed, during 2001~2010. For the study, we selected 7 main-stream sites of the reservoir along the main axis of the reservoir, from the headwater to the dam and 8 tributary streams. In-reservoir nutrients of TN and TP showed longitudinal declines from the headwater to the dam, which results in a distinct zonation of the riverine ($R_z$, M1~M3), transition ($T_z$, M4~M6), and lacustrine zone ($L_z$, M7) in water quality, as shown in other foreign reservoirs. Chlorophyll-a (CHL) and BOD as an indicator of organic matter, were maximum in the $T_z$. Concentration of total phosphorus (TP) was the highest (8.52 $mg\;L^{-1}$) on March in the $R_z$, and was the highest (165 ${\mu}g\;L^{-1}$) in the $L_z$ on July. Values of TN was the maximum (377 ${\mu}g\;L^{-1}$) on August in the $R_z$, and was the highest (3.76 $mg\;L^{-1}$) in the $L_z$ on August. Ionic dilution was evident during September~October, after the monsoon rain. The mean ratios of TN : TP, as an indicator of limiting factor, were 88, which indicates that nitrogen is a surplus for phytoplankton growth in this system. Nutrient analysis of inflowing streams showed that major nutrient sources were headwater streams of T1~T2 and Ockcheon-Stream of T5, and the most influential inflowing stream to the reservoir was T5, which is located in the mid-reservoir, and is directly influenced by the waste-water treatment plants. The key parameters, influenced by the monsoon rain, were TP and suspended solids (SS). Empirical models of trophic variables indicated that variations of CHL in the $R_z$ ($R^2$=0.044, p=0.264) and $T_z$ ($R^2$=0.126, p=0.054) were not accounted by TN, but were significant (p=0.032) in the $L_z$. The variation of the log-transformed $I_r$-CHL was not accounted ($R^2$=0.258, p=0.110) by $I_w$-TN of inflowing streams, but was determined ($R^2$=0.567, p=0.005) by $I_w$-TP of inflowing streams. In other words, TP inputs from the inflowing streams were the major determinants on the in-reservoir phytoplankton growth. Regression analysis of TN : TP suggested that the ratio was determined by P, rather than N. Overall, our data suggest that TP and suspended solids, during the summer flood period, should be reduced from the eutrophication control and P-input from Ockcheon-Stream should be controlled for water quality improvement.

A Two-Stage Learning Method of CNN and K-means RGB Cluster for Sentiment Classification of Images (이미지 감성분류를 위한 CNN과 K-means RGB Cluster 이-단계 학습 방안)

  • Kim, Jeongtae;Park, Eunbi;Han, Kiwoong;Lee, Junghyun;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.139-156
    • /
    • 2021
  • The biggest reason for using a deep learning model in image classification is that it is possible to consider the relationship between each region by extracting each region's features from the overall information of the image. However, the CNN model may not be suitable for emotional image data without the image's regional features. To solve the difficulty of classifying emotion images, many researchers each year propose a CNN-based architecture suitable for emotion images. Studies on the relationship between color and human emotion were also conducted, and results were derived that different emotions are induced according to color. In studies using deep learning, there have been studies that apply color information to image subtraction classification. The case where the image's color information is additionally used than the case where the classification model is trained with only the image improves the accuracy of classifying image emotions. This study proposes two ways to increase the accuracy by incorporating the result value after the model classifies an image's emotion. Both methods improve accuracy by modifying the result value based on statistics using the color of the picture. When performing the test by finding the two-color combinations most distributed for all training data, the two-color combinations most distributed for each test data image were found. The result values were corrected according to the color combination distribution. This method weights the result value obtained after the model classifies an image's emotion by creating an expression based on the log function and the exponential function. Emotion6, classified into six emotions, and Artphoto classified into eight categories were used for the image data. Densenet169, Mnasnet, Resnet101, Resnet152, and Vgg19 architectures were used for the CNN model, and the performance evaluation was compared before and after applying the two-stage learning to the CNN model. Inspired by color psychology, which deals with the relationship between colors and emotions, when creating a model that classifies an image's sentiment, we studied how to improve accuracy by modifying the result values based on color. Sixteen colors were used: red, orange, yellow, green, blue, indigo, purple, turquoise, pink, magenta, brown, gray, silver, gold, white, and black. It has meaning. Using Scikit-learn's Clustering, the seven colors that are primarily distributed in the image are checked. Then, the RGB coordinate values of the colors from the image are compared with the RGB coordinate values of the 16 colors presented in the above data. That is, it was converted to the closest color. Suppose three or more color combinations are selected. In that case, too many color combinations occur, resulting in a problem in which the distribution is scattered, so a situation fewer influences the result value. Therefore, to solve this problem, two-color combinations were found and weighted to the model. Before training, the most distributed color combinations were found for all training data images. The distribution of color combinations for each class was stored in a Python dictionary format to be used during testing. During the test, the two-color combinations that are most distributed for each test data image are found. After that, we checked how the color combinations were distributed in the training data and corrected the result. We devised several equations to weight the result value from the model based on the extracted color as described above. The data set was randomly divided by 80:20, and the model was verified using 20% of the data as a test set. After splitting the remaining 80% of the data into five divisions to perform 5-fold cross-validation, the model was trained five times using different verification datasets. Finally, the performance was checked using the test dataset that was previously separated. Adam was used as the activation function, and the learning rate was set to 0.01. The training was performed as much as 20 epochs, and if the validation loss value did not decrease during five epochs of learning, the experiment was stopped. Early tapping was set to load the model with the best validation loss value. The classification accuracy was better when the extracted information using color properties was used together than the case using only the CNN architecture.