• Title/Summary/Keyword: Lot System

Search Result 3,946, Processing Time 0.029 seconds

The Variation of the Dissolved Inorganic Nutrients in the Costal Area of Gunsan, Yellow Sea from 2001 to 2010 (서해 군산 연안의 2001년부터 2010년까지의 용존성무기영양염류의 변동)

  • Heo, Seung;Kweon, Jung-Ro;Park, Jong-Soo
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.17 no.4
    • /
    • pp.357-365
    • /
    • 2011
  • The variation of the dissolved inorganic nutrients were investigated four times per year in the costal area of Gunsan, Yellow Sea from 2001 to 2010. Water samples were collected at 10 stations and phsico-chemical parameters were analyzed including water temperature, salinity, suspended solids, dissolved oxygen, chemical oxygen demand, chlorophyll a and dissolved inorganic nutrients. The average of dissolved inorganic nitrogen(DIN) for ten years at Gunsan area showed similar concentration between surface and bottom. The average of DIN at surface was 0.421mg/L (0.198~0.846mg/L) and bottom was 0.344mg/L(0.148~0.717mg/L). The highest value of annual average of DIN at surface was 0.846mg/L in 2002 and the lowest value was 0.198mg/L in 2010. The percentage of ammonia, nitrite and nitrate for the average DIN of 10 years showed 27%, 3% and 70% which showed most of DIN was nitrate. Dissolved inorganic phosphate(DIP) for ten years at Gunsan area showed similar concentration between surface and bottom and DIP was decreasing from 2003 to 2010. The average of DIP of 10 years was 0.024mg/L and annual average 0.021mg/L in 2008, 0.007mg/L in 2009 and 0.008mg/L in 2010 which showed decreasing pattern from 2007 to 2010. The average of DIN/DIP ratio from 2002 to 2010 was 6.0(3.2~10.1) at surface and 4.6(2.6~7.0) at bottom. The average value of dissolved inorganic silicate from 2004 to 2010 showed 0.372mg/L at surface layer and 0.352mg/L at bottom layer and was on decreased from 2006 to 2010. The Spearman's correlation analysis was carried out to knowrelation among the salinity and dissolved inorganic nutrients at the surface and bottom layer. The correlation factor of DIN was -0.72, DIP was -0.46 and dissolved inorganic silicate was -0.63 at surface layer and DIN was -0.70, DIP was -0.44 and dissolved inorganic silicate was -0.57 at bottom layer. The dissolved inorganic nutrients at the nearshore of Gunsan was affected from the freshwater discharge of Geum river. Especially, a lot of DIN flowed into the nearshore of Gunsan from Guem river. The concentration of dissolved inorganic nutrients at Gunsan showed high at station 1, 2 and 3 and there was a little concentration differences according to the cruise time. The concentration of dissolved inorganic nutrients showed high value at the station 1, 2, 3 which exist nearshore of Gunsan city and it means these stations mainly affected by Geum river and Gunsan city. The annual average of dissolved inorganic nutrients showed gradually decreased from 2003 to 2010 and we need more research on this conditions.

Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach (카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.123-138
    • /
    • 2017
  • Since the stock market is driven by the expectation of traders, studies have been conducted to predict stock price movements through analysis of various sources of text data. In order to predict stock price movements, research has been conducted not only on the relationship between text data and fluctuations in stock prices, but also on the trading stocks based on news articles and social media responses. Studies that predict the movements of stock prices have also applied classification algorithms with constructing term-document matrix in the same way as other text mining approaches. Because the document contains a lot of words, it is better to select words that contribute more for building a term-document matrix. Based on the frequency of words, words that show too little frequency or importance are removed. It also selects words according to their contribution by measuring the degree to which a word contributes to correctly classifying a document. The basic idea of constructing a term-document matrix was to collect all the documents to be analyzed and to select and use the words that have an influence on the classification. In this study, we analyze the documents for each individual item and select the words that are irrelevant for all categories as neutral words. We extract the words around the selected neutral word and use it to generate the term-document matrix. The neutral word itself starts with the idea that the stock movement is less related to the existence of the neutral words, and that the surrounding words of the neutral word are more likely to affect the stock price movements. And apply it to the algorithm that classifies the stock price fluctuations with the generated term-document matrix. In this study, we firstly removed stop words and selected neutral words for each stock. And we used a method to exclude words that are included in news articles for other stocks among the selected words. Through the online news portal, we collected four months of news articles on the top 10 market cap stocks. We split the news articles into 3 month news data as training data and apply the remaining one month news articles to the model to predict the stock price movements of the next day. We used SVM, Boosting and Random Forest for building models and predicting the movements of stock prices. The stock market opened for four months (2016/02/01 ~ 2016/05/31) for a total of 80 days, using the initial 60 days as a training set and the remaining 20 days as a test set. The proposed word - based algorithm in this study showed better classification performance than the word selection method based on sparsity. This study predicted stock price volatility by collecting and analyzing news articles of the top 10 stocks in market cap. We used the term - document matrix based classification model to estimate the stock price fluctuations and compared the performance of the existing sparse - based word extraction method and the suggested method of removing words from the term - document matrix. The suggested method differs from the word extraction method in that it uses not only the news articles for the corresponding stock but also other news items to determine the words to extract. In other words, it removed not only the words that appeared in all the increase and decrease but also the words that appeared common in the news for other stocks. When the prediction accuracy was compared, the suggested method showed higher accuracy. The limitation of this study is that the stock price prediction was set up to classify the rise and fall, and the experiment was conducted only for the top ten stocks. The 10 stocks used in the experiment do not represent the entire stock market. In addition, it is difficult to show the investment performance because stock price fluctuation and profit rate may be different. Therefore, it is necessary to study the research using more stocks and the yield prediction through trading simulation.

A Coexistence Model in a Dynamic Platform with ICT-based Multi-Value Chains: focusing on Healthcare Service (ICT 기반 다중 가치사슬의 동적 플랫폼에서의 공존 모형: 의료서비스를 중심으로)

  • Lee, Hyun Jung;Chang, Yong Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.69-93
    • /
    • 2017
  • The development of ICT has leaded the diversification and changes of supplies and demands in markets. It also caused the creations of a variety of values which are differentiated from those in the existing market. Therefore, a new-type market is created, which can include multi-value chains which are from ICT-based created markets as well as the existing markets. We defined the platform as the new-type market. In the platform, the multi-value chains can be coexisted with multi-values. In true market, when a new-type value chain entered into an existing market, it is general that it can be conflicted with the existing value chain in the market. The conflicted problem among multi-value chains in a market is caused by the sharing of limited market resources like suppliers, consumers, services or products among the value chains. In other words, if there are multi-value chains in the platform, then it is possible to have conflictions, overlapping, creations or losses of values among the value chains. To solve the problem, we introduce coexistence factors to reduce the conflictions to reach market equilibrium in the platform. In the other hand, it is possible to lead the creations of differentiated values from the existing market and to augment the total market values in the platform. In the early era of ICT development, ICT was introduced for improvement of efficiency and effectiveness of the value chains in the existing market. However, according to the changed role of ICT from the supporter to the promotor of the market, ICT became to lead the variations of the value chains and creations of various values in the markets. For instance, Uber Taxi created a new value chain with ICT-based new-type service or products with new resources like new suppliers and consumers. When Uber and Traditional Taxi services are playing at the same time in Taxi service platform, it is possible to create values or make conflictions among values between the new and old value chains. In this research, like Uber and traditional taxi services, if there are conflictions among the multi-value chains, then it is necessary to minimize the conflictions in the platform for the coexistence of multi-value chains which can create the value-added values in the platform. So, it is important to predict and discuss the possible conflicted problems between new and old value chains. The confliction should be solved to reach market equilibrium with multi-value chains in the platform. That is, we discuss the possibility of the coexistence of multi-value chains in the platform which are comprised of a variety of suppliers and customers. To do this, especially we are focusing on the healthcare markets. Nowadays healthcare markets are popularized in global market as well as domestic. Therefore, there are a lot of and a variety of healthcare services like Traditional-, Tele-, or Intelligent- healthcare services and so on. It shows that there are multi-suppliers, -consumers and -services as components of each different value chain in the same platform. The platform can be shared by different values that are created or overlapped by confliction and loss of values in the value chains. In this research, as was said, we focused on the healthcare services to show if a platform can be shared by different value chains like traditional-, tele-healthcare and intelligent-healthcare services and products. Additionally, we try to show if it is possible to increase the value of each value chain as well as the total value of the platform. As the result, it is possible to increase of each value of each value chain as well as the total value in the platform. Finally, we propose a coexistence model to overcome such problems and showed the possibility of coexistence between the value chains through experimentation.

Soil amendment for turfgrass vegetation of the Incheon International Airport runway side on the Yeongjong reclaimed land (인천국제공항 착륙대 잔디 식재 지반 조성을 위한 영종도 매립 토양 개량)

  • Yoo, Sun-Ho;Jeong, Yeong-Sang;Joo, Young-Kyu;Choi, Byung-Kwon;Wu, Heun-Young;Lee, Tae-Young
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.35 no.2
    • /
    • pp.93-104
    • /
    • 2002
  • A field survey and experiment was conducted from 1996 to 1998 to develop rational technology for turfgrass vegetation of runway side of Incheon International Airport on the reclaimed tidal land in Young-Jong Island. Backfill of the experimental site was finished on August 1995. The experimental site was 8 ha located in the middle of the construction place for the main parking lot in front of the terminal building construction. The experimental field was drained by main open ditch, and divided three main plots, no subsurface tile drain, subsurface tile drain spacing with 22.5m, and with 45 m, respectively. The 17 sub plots were designed to test the effect of soil covering with red earth loam by 5 cm and 20 cm depth, application of chemical compound fertilizers and livestock manures, dressing of artifical soils and hydrophylic soil conditioners. The tested turfgrasses were three transplanting indigenous turfgrasses, Zoysia koreana, Zoysia sinica and Zoysia japonica, and two hydroseeding mixed exotic turgrasses, cool type I(tall fescue 30%, kentucky blue grass 40%, perenial ryegrass 30%), and cool type II(tall fescue 40%, perenial ryegrass 20%, fine fescue 20%, alkaligrass 20%). The soil backfilled with dredged seasand was sand textured with high salt concentration and low fertility. The soil showed high pH, low organic matter and low available phophate contents. The percolation rate was fast with high hydraulic conductivity. Desalinization was fast after installation of the main open drainage system. No subsurface tile drainage effect was found showing little difference in turfgrass growth. The covering and visual growth of turfgrasses were the best in the 20-cm soil covering with compound fertilizer treatment. The covering and visual growth of turfgrasses were satisfactory in the 5 cm soil covering with compound fertilizer treatment and with livestock manure treatments. The hydrophillic soil conditioner treatments were effective but expensive at present. The coverage and visual quality of turfgrasses were good for Zoysia koreana and Zoysia japonica. The coverages of turfgrasses by the hydroseeding with the mixed exotic turfgrasses were less than transplanting of native turfgrasses. In conclusion, for the runway side vegetation purposes, the subsurface tile drainage might not necessary as main open ditch drainage be sufficient due to fast percolation rate of the backfilled dredged seasand. The 5 cm soil covering with red earth might be sufficient for the runway side, but the 20 cm soil covering might be necessary for the runway side where high density of turfgrass coverage was necessary to protect from the airplance air blow.

A Basic Study on the Establishment of Preservation and Management for Natural Monument(No.374) Pyeongdae-ri Torreya nucifera forest of Jeju (천연기념물 제374호 제주 평대리 비자나무 숲의 보존·관리방향 설정을 위한 기초연구)

  • Lee, Won-Ho;Kim, Dong-Hyun;Kim, Jae-Ung;Oh, Hae-Sung;Choi, Byung-Ki;Lee, Jong-Sung
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.32 no.1
    • /
    • pp.93-106
    • /
    • 2014
  • In this study, Analyze environment of location, investigation into vegetation resources, survey management status and establish to classify the management area for Natural monument No.374 Pyengdae-ri Torreya nucifera forest. The results were as follows: First, Torreya nucifera forest is concerned about influence of development caused by utilization of land changes to agricultural region. Thus, establish to preservation management plan for preservation of prototypical and should be excluded development activity to cause the change of terrain that Gotjawal in the Torreya nucifera forest is factor of base for generating species diversity. Secondly, Torreya nucifera forest summarized as 402 taxa composed 91 familly 263 genus, 353 species, 41 varieties and 8 forms. The distribution of plants for the first grade & second grade appear of endangered plant to Ministry of Environment specify. But, critically endangered in forest by changes in habitat, diseases and illegal overcatching. Therefore, when establishing forest management plan should be considered for put priority on protection. Thirdly, Torreya nucifera representing the upper layer of the vegetation structure. But, old tree oriented management and conservation strategy result in poor age structure. Furthermore, desiccation of forest on artificial management and decline in Torreya nucifera habitat on ecological succession can indicate a problem in forest. Therefore, establish plan such as regulation of population density and sapling tree proliferation for sustainable characteristics of the Torreya nucifera forest. Fourth, Appear to damaged of trails caused by use. Especially, Scoria way occurs a lot of damaged and higher than the share ratio of each section. Therefore, share ratio reduction Plan should be considered through the additional development of tourism routes rather than the replacement of Scoria. Fifth, Representing high preference of the Torreya nucifera forest tourist factor confirmed the plant elements. It is sensitive to usage pressure. And requires continuous monitoring by characteristic of Non-permanent. In addition, need an additional plan such as additional development of tourism elements and active utilizing an element of high preference. Sixth, Strength of protected should be differently accordance with importance. First grade area have to maintenance of plant population and natural habitats. Set the direction of the management. Second grade areas focus on annual regeneration of the forest. Third grade area should be utilized demonstration forest or set to the area for proliferate sapling. Fourth grade areas require the introduced of partial rest system that disturbance are often found in proper vegetation. Fifth grade area appropriate to the service area for promoting tourism by utilizing natural resources in Torreya nucifera forest. Furthermore, installation of a buffer zone in relatively low ratings area and periodic monitoring to the improvement of edge effect that adjacent areas of different class.

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

  • Kim, Seungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.221-241
    • /
    • 2018
  • Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

A Study on the Differences of Information Diffusion Based on the Type of Media and Information (매체와 정보유형에 따른 정보확산 차이에 대한 연구)

  • Lee, Sang-Gun;Kim, Jin-Hwa;Baek, Heon;Lee, Eui-Bang
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.133-146
    • /
    • 2013
  • While the use of internet is routine nowadays, users receive and share information through a variety of media. Through the use of internet, information delivery media is diversifying from traditional media of one-way communication, such as newspaper, TV, and radio, into media of two-way communication. In contrast of traditional media, blogs enable individuals to directly upload and share news, which can be considered to have a differential speed of information diffusion than news media that convey information unilaterally. Therefore this Study focused on the difference between online news and social media blogs. Moreover, there are variations in the speed of information diffusion because that information closely related to one person boosts communications between individuals. We believe that users' standard of evaluation would change based on the types of information. As well, the speed of information diffusion would change based on the level of proximity. Therefore, the purpose of this study is to examine the differences in information diffusion based on the types of media. And then information is segmentalized and an examination is done to see how information diffusion differentiates based on the types of information. This study used the Bass diffusion model, which has been frequently used because this model has higher explanatory power than other models by explaining diffusion of market through innovation effect and imitation effect. Also this model has been applied a lot in other information diffusion related studies. The Bass diffusion model includes an innovation effect and an imitation effect. Innovation effect measures the early-stage impact, while the imitation effect measures the impact of word of mouth at the later stage. According to Mahajan et al. (2000), Innovation effect is emphasized by usefulness and ease-of-use, as well Imitation effect is emphasized by subjective norm and word-of-mouth. Also, according to Lee et al. (2011), Innovation effect is emphasized by mass communication. According to Moore and Benbasat (1996), Innovation effect is emphasized by relative advantage. Because Imitation effect is adopted by within-group influences and Innovation effects is adopted by product's or service's innovation. Therefore, ours study compared online news and social media blogs to examine the differences between media. We also choose different types of information including entertainment related information "Psy Gentelman", Current affair news "Earthquake in Sichuan, China", and product related information "Galaxy S4" in order to examine the variations on information diffusion. We considered that users' information proximity alters based on the types of information. Hence, we chose the three types of information mentioned above, which have different level of proximity from users' standpoint, in order to examine the flow of information diffusion. The first conclusion of this study is that different media has similar effect on information diffusion, even the types of media of information provider are different. Information diffusion has only been distinguished by a disparity between proximity of information. Second, information diffusions differ based on types of information. From the standpoint of users, product and entertainment related information has high imitation effect because of word of mouth. On the other hand, imitation effect dominates innovation effect on Current affair news. From the results of this study, the flow changes of information diffusion is examined and be applied to practical use. This study has some limitations, and those limitations would be able to provide opportunities and suggestions for future research. Presenting the difference of Information diffusion according to media and proximity has difficulties for generalization of theory due to small sample size. Therefore, if further studies adopt to a request for an increase of sample size and media diversity, difference of the information diffusion according to media type and information proximity could be understood more detailed.

Impact of Semantic Characteristics on Perceived Helpfulness of Online Reviews (온라인 상품평의 내용적 특성이 소비자의 인지된 유용성에 미치는 영향)

  • Park, Yoon-Joo;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.29-44
    • /
    • 2017
  • In Internet commerce, consumers are heavily influenced by product reviews written by other users who have already purchased the product. However, as the product reviews accumulate, it takes a lot of time and effort for consumers to individually check the massive number of product reviews. Moreover, product reviews that are written carelessly actually inconvenience consumers. Thus many online vendors provide mechanisms to identify reviews that customers perceive as most helpful (Cao et al. 2011; Mudambi and Schuff 2010). For example, some online retailers, such as Amazon.com and TripAdvisor, allow users to rate the helpfulness of each review, and use this feedback information to rank and re-order them. However, many reviews have only a few feedbacks or no feedback at all, thus making it hard to identify their helpfulness. Also, it takes time to accumulate feedbacks, thus the newly authored reviews do not have enough ones. For example, only 20% of the reviews in Amazon Review Dataset (Mcauley and Leskovec, 2013) have more than 5 reviews (Yan et al, 2014). The purpose of this study is to analyze the factors affecting the usefulness of online product reviews and to derive a forecasting model that selectively provides product reviews that can be helpful to consumers. In order to do this, we extracted the various linguistic, psychological, and perceptual elements included in product reviews by using text-mining techniques and identifying the determinants among these elements that affect the usability of product reviews. In particular, considering that the characteristics of the product reviews and determinants of usability for apparel products (which are experiential products) and electronic products (which are search goods) can differ, the characteristics of the product reviews were compared within each product group and the determinants were established for each. This study used 7,498 apparel product reviews and 106,962 electronic product reviews from Amazon.com. In order to understand a review text, we first extract linguistic and psychological characteristics from review texts such as a word count, the level of emotional tone and analytical thinking embedded in review text using widely adopted text analysis software LIWC (Linguistic Inquiry and Word Count). After then, we explore the descriptive statistics of review text for each category and statistically compare their differences using t-test. Lastly, we regression analysis using the data mining software RapidMiner to find out determinant factors. As a result of comparing and analyzing product review characteristics of electronic products and apparel products, it was found that reviewers used more words as well as longer sentences when writing product reviews for electronic products. As for the content characteristics of the product reviews, it was found that these reviews included many analytic words, carried more clout, and related to the cognitive processes (CogProc) more so than the apparel product reviews, in addition to including many words expressing negative emotions (NegEmo). On the other hand, the apparel product reviews included more personal, authentic, positive emotions (PosEmo) and perceptual processes (Percept) compared to the electronic product reviews. Next, we analyzed the determinants toward the usefulness of the product reviews between the two product groups. As a result, it was found that product reviews with high product ratings from reviewers in both product groups that were perceived as being useful contained a larger number of total words, many expressions involving perceptual processes, and fewer negative emotions. In addition, apparel product reviews with a large number of comparative expressions, a low expertise index, and concise content with fewer words in each sentence were perceived to be useful. In the case of electronic product reviews, those that were analytical with a high expertise index, along with containing many authentic expressions, cognitive processes, and positive emotions (PosEmo) were perceived to be useful. These findings are expected to help consumers effectively identify useful product reviews in the future.

Analysis of the Time-dependent Relation between TV Ratings and the Content of Microblogs (TV 시청률과 마이크로블로그 내용어와의 시간대별 관계 분석)

  • Choeh, Joon Yeon;Baek, Haedeuk;Choi, Jinho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.163-176
    • /
    • 2014
  • Social media is becoming the platform for users to communicate their activities, status, emotions, and experiences to other people. In recent years, microblogs, such as Twitter, have gained in popularity because of its ease of use, speed, and reach. Compared to a conventional web blog, a microblog lowers users' efforts and investment for content generation by recommending shorter posts. There has been a lot research into capturing the social phenomena and analyzing the chatter of microblogs. However, measuring television ratings has been given little attention so far. Currently, the most common method to measure TV ratings uses an electronic metering device installed in a small number of sampled households. Microblogs allow users to post short messages, share daily updates, and conveniently keep in touch. In a similar way, microblog users are interacting with each other while watching television or movies, or visiting a new place. In order to measure TV ratings, some features are significant during certain hours of the day, or days of the week, whereas these same features are meaningless during other time periods. Thus, the importance of features can change during the day, and a model capturing the time sensitive relevance is required to estimate TV ratings. Therefore, modeling time-related characteristics of features should be a key when measuring the TV ratings through microblogs. We show that capturing time-dependency of features in measuring TV ratings is vitally necessary for improving their accuracy. To explore the relationship between the content of microblogs and TV ratings, we collected Twitter data using the Get Search component of the Twitter REST API from January 2013 to October 2013. There are about 300 thousand posts in our data set for the experiment. After excluding data such as adverting or promoted tweets, we selected 149 thousand tweets for analysis. The number of tweets reaches its maximum level on the broadcasting day and increases rapidly around the broadcasting time. This result is stems from the characteristics of the public channel, which broadcasts the program at the predetermined time. From our analysis, we find that count-based features such as the number of tweets or retweets have a low correlation with TV ratings. This result implies that a simple tweet rate does not reflect the satisfaction or response to the TV programs. Content-based features extracted from the content of tweets have a relatively high correlation with TV ratings. Further, some emoticons or newly coined words that are not tagged in the morpheme extraction process have a strong relationship with TV ratings. We find that there is a time-dependency in the correlation of features between the before and after broadcasting time. Since the TV program is broadcast at the predetermined time regularly, users post tweets expressing their expectation for the program or disappointment over not being able to watch the program. The highly correlated features before the broadcast are different from the features after broadcasting. This result explains that the relevance of words with TV programs can change according to the time of the tweets. Among the 336 words that fulfill the minimum requirements for candidate features, 145 words have the highest correlation before the broadcasting time, whereas 68 words reach the highest correlation after broadcasting. Interestingly, some words that express the impossibility of watching the program show a high relevance, despite containing a negative meaning. Understanding the time-dependency of features can be helpful in improving the accuracy of TV ratings measurement. This research contributes a basis to estimate the response to or satisfaction with the broadcasted programs using the time dependency of words in Twitter chatter. More research is needed to refine the methodology for predicting or measuring TV ratings.