• Title/Summary/Keyword: SNS 데이터

Search Result 558, Processing Time 0.025 seconds

Sentiment Analysis of News Based on Generative AI and Real Estate Price Prediction: Application of LSTM and VAR Models (생성 AI기반 뉴스 감성 분석과 부동산 가격 예측: LSTM과 VAR모델의 적용)

  • Sua Kim;Mi Ju Kwon;Hyon Hee Kim
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.5
    • /
    • pp.209-216
    • /
    • 2024
  • Real estate market prices are determined by various factors, including macroeconomic variables, as well as the influence of a variety of unstructured text data such as news articles and social media. News articles are a crucial factor in predicting real estate transaction prices as they reflect the economic sentiment of the public. This study utilizes sentiment analysis on news articles to generate a News Sentiment Index score, which is then seamlessly integrated into a real estate price prediction model. To calculate the sentiment index, the content of the articles is first summarized. Then, using AI, the summaries are categorized into positive, negative, and neutral sentiments, and a total score is calculated. This score is then applied to the real estate price prediction model. The models used for real estate price prediction include the Multi-head attention LSTM model and the Vector Auto Regression model. The LSTM prediction model, without applying the News Sentiment Index (NSI), showed Root Mean Square Error (RMSE) values of 0.60, 0.872, and 1.117 for the 1-month, 2-month, and 3-month forecasts, respectively. With the NSI applied, the RMSE values were reduced to 0.40, 0.724, and 1.03 for the same forecast periods. Similarly, the VAR prediction model without the NSI showed RMSE values of 1.6484, 0.6254, and 0.9220 for the 1-month, 2-month, and 3-month forecasts, respectively, while applying the NSI led to RMSE values of 1.1315, 0.3413, and 1.6227 for these periods. These results demonstrate the effectiveness of the proposed model in predicting apartment transaction price index and its ability to forecast real estate market price fluctuations that reflect socio-economic trends.

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

Accessibility Analysis in Mapping Cultural Ecosystem Service of Namyangju-si (접근성 개념을 적용한 문화서비스 평가 -남양주시를 대상으로-)

  • Jun, Baysok;Kang, Wanmo;Lee, Jaehyuck;Kim, Sunghoon;Kim, Byeori;Kim, Ilkwon;Lee, Jooeun;Kwon, Hyuksoo
    • Journal of Environmental Impact Assessment
    • /
    • v.27 no.4
    • /
    • pp.367-377
    • /
    • 2018
  • A cultural ecosystem service(CES), which is non-material benefit that human gains from ecosystem, has been recently further recognized as gross national income increases. Previous researches proposed to quantify the value of CES, which still remains as a challenging issue today due to its social and cultural subjectivity. This study proposes new way of assessing CES which is called Cultural Service Opportunity Spectrum(CSOS). CSOS is accessibility based CES assessment methodology for regional scale and it is designed to be applicable for any regions in Korea for supporting decision making process. CSOS employed public spatial data which are road network and population density map. In addition, the results of 'Rapid Assessment of Natural Assets' implemented by National Institute of Ecology, Korea were used as a complementary data. CSOS was applied to Namyangju-si and the methodology resulted in revealing specific areas with great accessibility to 'Natural Assets' in the region. Based on the results, the advantages and limitations of the methodology were discussed with regard to weighting three main factors and in contrast to Scenic Quality model and Recreation model of InVEST which have been commonly used for assessing CES today due to its convenience today.

A Comparative Study of Domestic Travel Patterns and Determinant Factors Affecting Satisfaction by Generations (대한민국 국민의 세대별 국내여행 방식 및 만족도 영향요인)

  • Mi-Sook Lee;Yoon-Joo Park
    • Information Systems Review
    • /
    • v.22 no.2
    • /
    • pp.137-166
    • /
    • 2020
  • While South Koreans overseas travelling rate has been increased every year, domestic travelling rate has been at a standstill for several years. The purpose of this study is to analyze domestic traveling styles of Koreans according to their generations in order to provide generation-specific traveling services. For this purpose, we categorized the survey respondents into four different generations, which are Millennium (age 19~34), X generation (35~54), Baby Boomer (55~64) and senior by following the criterions of the Korea National Tourism Organization. After then, we analyze factors related to travel preparation process, the actual traveling activities and satisfaction after the travel. In this study, 16,713 data collected by the Ministry of Culture, Sports and Tourism are used. The results of this study show that Korean people tends to acquire domestic traveling information from their own or acquaintances past experiences. Also, they do not prefer the organized trip for domestic travels, thus do not buy package products a lot. In addition, natural scenery, rich in cultural heritage, and convenient accommodation are the most important determinant factors affecting the overall travel satisfaction of level for all generations. The traveling characteristics for each generation are as follows. Millennium get traveling information from the internet a lot, and more specifically, they refer portal sites and social network services (SNS) in many cases. Also, they tend to travel in summer peak season to popular destinations and pursues active traveling experiences. Generation X has similar traveling patterns with Millennium, however they major transportation method is using their own car. Also, transportation convenience and satisfactory leisure activity are important factors affecting the overall satisfaction level to Generation X. On the other hand, Baby boomer generation has a greater emphasis on appreciation of nature, visiting famous restaurants, and relaxation, rather than actively participating experiencing programs. They travel evenly in summer and spring/fall season to many different areas instead of focusing on popular tourist spots. In addition, shopping and eating delicious food are the important factors affecting the overall satisfaction level for them. Lastly, Senior generation has similar characteristics with Baby boomer in many ways, however, they travel a lot on the same day using public transportations or car rental service. They prefer spring and autumn trips rather than summer peak season, and tend to buy packaged travel products a lot compared with other generations. If these different traveling characteristics of each generation are considered for organizing and customizing tourism services, it is expected that domestic tourism satisfaction level will be ultimately increased.

Structural features and Diffusion Patterns of Gartner Hype Cycle for Artificial Intelligence using Social Network analysis (인공지능 기술에 관한 가트너 하이프사이클의 네트워크 집단구조 특성 및 확산패턴에 관한 연구)

  • Shin, Sunah;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.107-129
    • /
    • 2022
  • It is important to preempt new technology because the technology competition is getting much tougher. Stakeholders conduct exploration activities continuously for new technology preoccupancy at the right time. Gartner's Hype Cycle has significant implications for stakeholders. The Hype Cycle is a expectation graph for new technologies which is combining the technology life cycle (S-curve) with the Hype Level. Stakeholders such as R&D investor, CTO(Chef of Technology Officer) and technical personnel are very interested in Gartner's Hype Cycle for new technologies. Because high expectation for new technologies can bring opportunities to maintain investment by securing the legitimacy of R&D investment. However, contrary to the high interest of the industry, the preceding researches faced with limitations aspect of empirical method and source data(news, academic papers, search traffic, patent etc.). In this study, we focused on two research questions. The first research question was 'Is there a difference in the characteristics of the network structure at each stage of the hype cycle?'. To confirm the first research question, the structural characteristics of each stage were confirmed through the component cohesion size. The second research question is 'Is there a pattern of diffusion at each stage of the hype cycle?'. This research question was to be solved through centralization index and network density. The centralization index is a concept of variance, and a higher centralization index means that a small number of nodes are centered in the network. Concentration of a small number of nodes means a star network structure. In the network structure, the star network structure is a centralized structure and shows better diffusion performance than a decentralized network (circle structure). Because the nodes which are the center of information transfer can judge useful information and deliver it to other nodes the fastest. So we confirmed the out-degree centralization index and in-degree centralization index for each stage. For this purpose, we confirmed the structural features of the community and the expectation diffusion patterns using Social Network Serice(SNS) data in 'Gartner Hype Cycle for Artificial Intelligence, 2021'. Twitter data for 30 technologies (excluding four technologies) listed in 'Gartner Hype Cycle for Artificial Intelligence, 2021' were analyzed. Analysis was performed using R program (4.1.1 ver) and Cyram Netminer. From October 31, 2021 to November 9, 2021, 6,766 tweets were searched through the Twitter API, and converting the relationship user's tweet(Source) and user's retweets (Target). As a result, 4,124 edgelists were analyzed. As a reult of the study, we confirmed the structural features and diffusion patterns through analyze the component cohesion size and degree centralization and density. Through this study, we confirmed that the groups of each stage increased number of components as time passed and the density decreased. Also 'Innovation Trigger' which is a group interested in new technologies as a early adopter in the innovation diffusion theory had high out-degree centralization index and the others had higher in-degree centralization index than out-degree. It can be inferred that 'Innovation Trigger' group has the biggest influence, and the diffusion will gradually slow down from the subsequent groups. In this study, network analysis was conducted using social network service data unlike methods of the precedent researches. This is significant in that it provided an idea to expand the method of analysis when analyzing Gartner's hype cycle in the future. In addition, the fact that the innovation diffusion theory was applied to the Gartner's hype cycle's stage in artificial intelligence can be evaluated positively because the Gartner hype cycle has been repeatedly discussed as a theoretical weakness. Also it is expected that this study will provide a new perspective on decision-making on technology investment to stakeholdes.

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

Clustering Method based on Genre Interest for Cold-Start Problem in Movie Recommendation (영화 추천 시스템의 초기 사용자 문제를 위한 장르 선호 기반의 클러스터링 기법)

  • You, Tithrottanak;Rosli, Ahmad Nurzid;Ha, Inay;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.57-77
    • /
    • 2013
  • Social media has become one of the most popular media in web and mobile application. In 2011, social networks and blogs are still the top destination of online users, according to a study from Nielsen Company. In their studies, nearly 4 in 5active users visit social network and blog. Social Networks and Blogs sites rule Americans' Internet time, accounting to 23 percent of time spent online. Facebook is the main social network that the U.S internet users spend time more than the other social network services such as Yahoo, Google, AOL Media Network, Twitter, Linked In and so on. In recent trend, most of the companies promote their products in the Facebook by creating the "Facebook Page" that refers to specific product. The "Like" option allows user to subscribed and received updates their interested on from the page. The film makers which produce a lot of films around the world also take part to market and promote their films by exploiting the advantages of using the "Facebook Page". In addition, a great number of streaming service providers allows users to subscribe their service to watch and enjoy movies and TV program. They can instantly watch movies and TV program over the internet to PCs, Macs and TVs. Netflix alone as the world's leading subscription service have more than 30 million streaming members in the United States, Latin America, the United Kingdom and the Nordics. As the matter of facts, a million of movies and TV program with different of genres are offered to the subscriber. In contrast, users need spend a lot time to find the right movies which are related to their interest genre. Recent years there are many researchers who have been propose a method to improve prediction the rating or preference that would give the most related items such as books, music or movies to the garget user or the group of users that have the same interest in the particular items. One of the most popular methods to build recommendation system is traditional Collaborative Filtering (CF). The method compute the similarity of the target user and other users, which then are cluster in the same interest on items according which items that users have been rated. The method then predicts other items from the same group of users to recommend to a group of users. Moreover, There are many items that need to study for suggesting to users such as books, music, movies, news, videos and so on. However, in this paper we only focus on movie as item to recommend to users. In addition, there are many challenges for CF task. Firstly, the "sparsity problem"; it occurs when user information preference is not enough. The recommendation accuracies result is lower compared to the neighbor who composed with a large amount of ratings. The second problem is "cold-start problem"; it occurs whenever new users or items are added into the system, which each has norating or a few rating. For instance, no personalized predictions can be made for a new user without any ratings on the record. In this research we propose a clustering method according to the users' genre interest extracted from social network service (SNS) and user's movies rating information system to solve the "cold-start problem." Our proposed method will clusters the target user together with the other users by combining the user genre interest and the rating information. It is important to realize a huge amount of interesting and useful user's information from Facebook Graph, we can extract information from the "Facebook Page" which "Like" by them. Moreover, we use the Internet Movie Database(IMDb) as the main dataset. The IMDbis online databases that consist of a large amount of information related to movies, TV programs and including actors. This dataset not only used to provide movie information in our Movie Rating Systems, but also as resources to provide movie genre information which extracted from the "Facebook Page". Formerly, the user must login with their Facebook account to login to the Movie Rating System, at the same time our system will collect the genre interest from the "Facebook Page". We conduct many experiments with other methods to see how our method performs and we also compare to the other methods. First, we compared our proposed method in the case of the normal recommendation to see how our system improves the recommendation result. Then we experiment method in case of cold-start problem. Our experiment show that our method is outperform than the other methods. In these two cases of our experimentation, we see that our proposed method produces better result in case both cases.

Consumer's Negative Brand Rumor Acceptance and Rumor Diffusion (소비자의 부정적 브랜드 루머의 수용과 확산)

  • Lee, Won-jun;Lee, Han-Suk
    • Asia Marketing Journal
    • /
    • v.14 no.2
    • /
    • pp.65-96
    • /
    • 2012
  • Brand has received much attention from considerable marketing research. When consumers consume product or services, they are exposed to a lot of brand related stimuli. These contain brand personality, brand experience, brand identity, brand communications and so on. A special kind of new crisis occasionally confronting companies' brand management today is the brand related rumor. An important influence on consumers' purchase decision making is the word-of-mouth spread by other consumers and most decisions are influenced by other's recommendations. In light of this influence, firms have reasonable reason to study and understand consumer-to-consumer communication such as brand rumor. The importance of brand rumor to marketers is increasing as the number of internet user and SNS(social network service) site grows. Due to the development of internet technology, people can spread rumors without the limitation of time, space and place. However relatively few studies have been published in marketing journals and little is known about brand rumors in the marketplace. The study of rumor has a long history in all major social science. But very few studies have dealt with the antecedents and consequences of any kind of brand rumor. Rumor has been generally described as a story or statement in general circulation without proper confirmation or certainty as to fact. And it also can be defined as an unconfirmed proposition, passed along from people to people. Rosnow(1991) claimed that rumors were transmitted because people needed to explain ambiguous and uncertain events and talking about them reduced associated anxiety. Especially negative rumors are believed to have the potential to devastate a company's reputation and relations with customers. From the perspective of marketer, negative rumors are considered harmful and extremely difficult to control in general. It is becoming a threat to a company's sustainability and sometimes leads to negative brand image and loss of customers. Thus there is a growing concern that these negative rumors can damage brands' reputations and lead them to financial disaster too. In this study we aimed to distinguish antecedents of brand rumor transmission and investigate the effects of brand rumor characteristics on rumor spread intention. We also found key components in personal acceptance of brand rumor. In contextualist perspective, we tried to unify the traditional psychological and sociological views. In this unified research approach we defined brand rumor's characteristics based on five major variables that had been found to influence the process of rumor spread intention. The five factors of usefulness, source credibility, message credibility, worry, and vividness, encompass multi level elements of brand rumor. We also selected product involvement as a control variable. To perform the empirical research, imaginary Korean 'Kimch' brand and related contamination rumor was created and proposed. Questionnaires were collected from 178 Korean samples. Data were collected from college students who have been experienced the focal product. College students were regarded as good subjects because they have a tendency to express their opinions in detail. PLS(partial least square) method was adopted to analyze the relations between variables in the equation model. The most widely adopted causal modeling method is LISREL. However it is poorly suited to deal with relatively small data samples and can yield not proper solutions in some cases. PLS has been developed to avoid some of these limitations and provide more reliable results. To test the reliability using SPSS 16 s/w, Cronbach alpha was examined and all the values were appropriate showing alpha values between .802 and .953. Subsequently, confirmatory factor analysis was conducted successfully. And structural equation modeling has been used to analyze the research model using smartPLS(ver. 2.0) s/w. Overall, R2 of adoption of rumor is .476 and R2 of intention of rumor transmission is .218. The overall model showed a satisfactory fit. The empirical results can be summarized as follows. According to the results, the variables of brand rumor characteristic such as source credibility, message credibility, worry, and vividness affect argument strength of rumor. And argument strength of rumor also affects rumor intention. On the other hand, the relationship between perceived usefulness and argument strength of rumor is not significant. The moderating effect of product involvement on the relations between argument strength of rumor and rumor W.O.M intention is not supported neither. Consequently this study suggests some managerial and academic implications. We consider some implications for corporate crisis management planning, PR and brand management. This results show marketers that rumor is a critical factor for managing strong brand assets. Also for researchers, brand rumor should become an important thesis of their interests to understand the relationship between consumer and brand. Recently many brand managers and marketers have focused on the short-term view. They just focused on strengthen the positive brand image. According to this study we suggested that effective brand management requires managing negative brand rumors with a long-term view of marketing decisions.

  • PDF