• Title/Summary/Keyword: Text clustering

Search Result 205, Processing Time 0.02 seconds

Online news-based stock price forecasting considering homogeneity in the industrial sector (산업군 내 동질성을 고려한 온라인 뉴스 기반 주가예측)

  • Seong, Nohyoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.1-19
    • /
    • 2018
  • Since stock movements forecasting is an important issue both academically and practically, studies related to stock price prediction have been actively conducted. The stock price forecasting research is classified into structured data and unstructured data, and it is divided into technical analysis, fundamental analysis and media effect analysis in detail. In the big data era, research on stock price prediction combining big data is actively underway. Based on a large number of data, stock prediction research mainly focuses on machine learning techniques. Especially, research methods that combine the effects of media are attracting attention recently, among which researches that analyze online news and utilize online news to forecast stock prices are becoming main. Previous studies predicting stock prices through online news are mostly sentiment analysis of news, making different corpus for each company, and making a dictionary that predicts stock prices by recording responses according to the past stock price. Therefore, existing studies have examined the impact of online news on individual companies. For example, stock movements of Samsung Electronics are predicted with only online news of Samsung Electronics. In addition, a method of considering influences among highly relevant companies has also been studied recently. For example, stock movements of Samsung Electronics are predicted with news of Samsung Electronics and a highly related company like LG Electronics.These previous studies examine the effects of news of industrial sector with homogeneity on the individual company. In the previous studies, homogeneous industries are classified according to the Global Industrial Classification Standard. In other words, the existing studies were analyzed under the assumption that industries divided into Global Industrial Classification Standard have homogeneity. However, existing studies have limitations in that they do not take into account influential companies with high relevance or reflect the existence of heterogeneity within the same Global Industrial Classification Standard sectors. As a result of our examining the various sectors, it can be seen that there are sectors that show the industrial sectors are not a homogeneous group. To overcome these limitations of existing studies that do not reflect heterogeneity, our study suggests a methodology that reflects the heterogeneous effects of the industrial sector that affect the stock price by applying k-means clustering. Multiple Kernel Learning is mainly used to integrate data with various characteristics. Multiple Kernel Learning has several kernels, each of which receives and predicts different data. To incorporate effects of target firm and its relevant firms simultaneously, we used Multiple Kernel Learning. Each kernel was assigned to predict stock prices with variables of financial news of the industrial group divided by the target firm, K-means cluster analysis. In order to prove that the suggested methodology is appropriate, experiments were conducted through three years of online news and stock prices. The results of this study are as follows. (1) We confirmed that the information of the industrial sectors related to target company also contains meaningful information to predict stock movements of target company and confirmed that machine learning algorithm has better predictive power when considering the news of the relevant companies and target company's news together. (2) It is important to predict stock movements with varying number of clusters according to the level of homogeneity in the industrial sector. In other words, when stock prices are homogeneous in industrial sectors, it is important to use relational effect at the level of industry group without analyzing clusters or to use it in small number of clusters. When the stock price is heterogeneous in industry group, it is important to cluster them into groups. This study has a contribution that we testified firms classified as Global Industrial Classification Standard have heterogeneity and suggested it is necessary to define the relevance through machine learning and statistical analysis methodology rather than simply defining it in the Global Industrial Classification Standard. It has also contribution that we proved the efficiency of the prediction model reflecting heterogeneity.

Analysis of Football Fans' Uniform Consumption: Before and After Son Heung-Min's Transfer to Tottenham Hotspur FC (국내 프로축구 팬들의 유니폼 소비 분석: 손흥민의 토트넘 홋스퍼 FC 이적 전후 비교)

  • Choi, Yeong-Hyeon;Lee, Kyu-Hye
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.91-108
    • /
    • 2020
  • Korea's famous soccer players are steadily performing well in international leagues, which led to higher interests of Korean fans in the international leagues. Reflecting the growing social phenomenon of rising interests on international leagues by Korean fans, the study examined the overall consumer perception in the consumption of uniform by domestic soccer fans and compared the changes in perception following the transfers of the players. Among others, the paper examined the consumer perception and purchase factors of soccer fans shown in social media, focusing on periods before and after the recruitment of Heung-Min Son to English Premier League's Tottenham Football Club. To this end, the EPL uniform is the collection keyword the paper utilized and collected consumer postings from domestic website and social media via Python 3.7, and analyzed them using Ucinet 6, NodeXL 1.0.1, and SPSS 25.0 programs. The results of this study can be summarized as follows. First, the uniform of the club that consistently topped the league, has been gaining attention as a popular uniform, and the players' performance, and the players' position have been identified as key factors in the purchase and search of professional football uniforms. In the case of the club, the actual ranking and whether the league won are shown to be important factors in the purchase and search of professional soccer uniforms. The club's emblem and the sponsor logo that will be attached to the uniform are also factors of interest to consumers. In addition, in the decision making process of purchase of a uniform by professional soccer fan, uniform's form, marking, authenticity, and sponsors are found to be more important than price, design, size, and logo. The official online store has emerged as a major purchasing channel, followed by gifts for friends or requests from acquaintances when someone travels to the United Kingdom. Second, a classification of key control categories through the convergence of iteration correlation analysis and Clauset-Newman-Moore clustering algorithm shows differences in the classification of individual groups, but groups that include the EPL's club and player keywords are identified as the key topics in relation to professional football uniforms. Third, between 2002 and 2006, the central theme for professional football uniforms was World Cup and English Premier League, but from 2012 to 2015, the focus has shifted to more interest of domestic and international players in the English Premier League. The subject has changed to the uniform itself from this time on. In this context, the paper can confirm that the major issues regarding the uniforms of professional soccer players have changed since Ji-Sung Park's transfer to Manchester United, and Sung-Yong Ki, Chung-Yong Lee, and Heung-Min Son's good performances in these leagues. The paper also identified that the uniforms of the clubs to which the players have transferred to are of interest. Fourth, both male and female consumers are showing increasing interest in Son's league, the English Premier League, which Tottenham FC belongs to. In particular, the increasing interest in Son has shown a tendency to increase interest in football uniforms for female consumers. This study presents a variety of researches on sports consumption and has value as a consumer study by identifying unique consumption patterns. It is meaningful in that the accuracy of the interpretation has been enhanced by using a cluster analysis via convergence of iteration correlation analysis and Clauset-Newman-Moore clustering algorithm to identify the main topics. Based on the results of this study, the clubs will be able to maximize its profits and maintain good relationships with fans by identifying key drivers of consumer awareness and purchasing for professional soccer fans and establishing an effective marketing strategy.

A Mobile Landmarks Guide : Outdoor Augmented Reality based on LOD and Contextual Device (모바일 랜드마크 가이드 : LOD와 문맥적 장치 기반의 실외 증강현실)

  • Zhao, Bi-Cheng;Rosli, Ahmad Nurzid;Jang, Chol-Hee;Lee, Kee-Sung;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.1-21
    • /
    • 2012
  • In recent years, mobile phone has experienced an extremely fast evolution. It is equipped with high-quality color displays, high resolution cameras, and real-time accelerated 3D graphics. In addition, some other features are includes GPS sensor and Digital Compass, etc. This evolution advent significantly helps the application developers to use the power of smart-phones, to create a rich environment that offers a wide range of services and exciting possibilities. To date mobile AR in outdoor research there are many popular location-based AR services, such Layar and Wikitude. These systems have big limitation the AR contents hardly overlaid on the real target. Another research is context-based AR services using image recognition and tracking. The AR contents are precisely overlaid on the real target. But the real-time performance is restricted by the retrieval time and hardly implement in large scale area. In our work, we exploit to combine advantages of location-based AR with context-based AR. The system can easily find out surrounding landmarks first and then do the recognition and tracking with them. The proposed system mainly consists of two major parts-landmark browsing module and annotation module. In landmark browsing module, user can view an augmented virtual information (information media), such as text, picture and video on their smart-phone viewfinder, when they pointing out their smart-phone to a certain building or landmark. For this, landmark recognition technique is applied in this work. SURF point-based features are used in the matching process due to their robustness. To ensure the image retrieval and matching processes is fast enough for real time tracking, we exploit the contextual device (GPS and digital compass) information. This is necessary to select the nearest and pointed orientation landmarks from the database. The queried image is only matched with this selected data. Therefore, the speed for matching will be significantly increased. Secondly is the annotation module. Instead of viewing only the augmented information media, user can create virtual annotation based on linked data. Having to know a full knowledge about the landmark, are not necessary required. They can simply look for the appropriate topic by searching it with a keyword in linked data. With this, it helps the system to find out target URI in order to generate correct AR contents. On the other hand, in order to recognize target landmarks, images of selected building or landmark are captured from different angle and distance. This procedure looks like a similar processing of building a connection between the real building and the virtual information existed in the Linked Open Data. In our experiments, search range in the database is reduced by clustering images into groups according to their coordinates. A Grid-base clustering method and user location information are used to restrict the retrieval range. Comparing the existed research using cluster and GPS information the retrieval time is around 70~80ms. Experiment results show our approach the retrieval time reduces to around 18~20ms in average. Therefore the totally processing time is reduced from 490~540ms to 438~480ms. The performance improvement will be more obvious when the database growing. It demonstrates the proposed system is efficient and robust in many cases.

A Literature Review and Classification of Recommender Systems on Academic Journals (추천시스템관련 학술논문 분석 및 분류)

  • Park, Deuk-Hee;Kim, Hyea-Kyeong;Choi, Il-Young;Kim, Jae-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.139-152
    • /
    • 2011
  • Recommender systems have become an important research field since the emergence of the first paper on collaborative filtering in the mid-1990s. In general, recommender systems are defined as the supporting systems which help users to find information, products, or services (such as books, movies, music, digital products, web sites, and TV programs) by aggregating and analyzing suggestions from other users, which mean reviews from various authorities, and user attributes. However, as academic researches on recommender systems have increased significantly over the last ten years, more researches are required to be applicable in the real world situation. Because research field on recommender systems is still wide and less mature than other research fields. Accordingly, the existing articles on recommender systems need to be reviewed toward the next generation of recommender systems. However, it would be not easy to confine the recommender system researches to specific disciplines, considering the nature of the recommender system researches. So, we reviewed all articles on recommender systems from 37 journals which were published from 2001 to 2010. The 37 journals are selected from top 125 journals of the MIS Journal Rankings. Also, the literature search was based on the descriptors "Recommender system", "Recommendation system", "Personalization system", "Collaborative filtering" and "Contents filtering". The full text of each article was reviewed to eliminate the article that was not actually related to recommender systems. Many of articles were excluded because the articles such as Conference papers, master's and doctoral dissertations, textbook, unpublished working papers, non-English publication papers and news were unfit for our research. We classified articles by year of publication, journals, recommendation fields, and data mining techniques. The recommendation fields and data mining techniques of 187 articles are reviewed and classified into eight recommendation fields (book, document, image, movie, music, shopping, TV program, and others) and eight data mining techniques (association rule, clustering, decision tree, k-nearest neighbor, link analysis, neural network, regression, and other heuristic methods). The results represented in this paper have several significant implications. First, based on previous publication rates, the interest in the recommender system related research will grow significantly in the future. Second, 49 articles are related to movie recommendation whereas image and TV program recommendation are identified in only 6 articles. This result has been caused by the easy use of MovieLens data set. So, it is necessary to prepare data set of other fields. Third, recently social network analysis has been used in the various applications. However studies on recommender systems using social network analysis are deficient. Henceforth, we expect that new recommendation approaches using social network analysis will be developed in the recommender systems. So, it will be an interesting and further research area to evaluate the recommendation system researches using social method analysis. This result provides trend of recommender system researches by examining the published literature, and provides practitioners and researchers with insight and future direction on recommender systems. We hope that this research helps anyone who is interested in recommender systems research to gain insight for future research.

Digital Archives of Cultural Archetype Contents: Its Problems and Direction (디지털 아카이브즈의 문제점과 방향 - 문화원형 콘텐츠를 중심으로 -)

  • Hahm, Han-Hee;Park, Soon-Cheol
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.17 no.2
    • /
    • pp.23-42
    • /
    • 2006
  • This is a study of the digital archives of Culturecontent.com where 'Cultural Archetype Contents' are currently in service. One of the major purposes of our study is to point out problems in the current system and eventually propose improvements to the digital archives. The government launched a four-year project for developing the cultural archetype content sources and establishing its related business with the hope of enhancing the nation's competitiveness. More specifically, the project focuses on the production of source materials of cultural archetype contents in the subjects of Korea's history. tradition, everyday life. arts and general geographical books. In addition, through this project, the government also intends to establish a proper distribution system of digitalized culture contents and to control copyright issues. This paper analyzes the digital archives system that stores the culture content data that have been produced from 2002 to 2005 and evaluates the current system's weaknesses and strengths. The summary of our findings is as follows. First. the digital archives system does not contain a semantic search engine and therefore its full function is 1agged. Second, similar data is not classified into the same categories but into the different ones, thereby confusing and inconveniencing users. Users who want to find source materials could be disappointed by the current distributive system. Our paper suggests a better system of digital archives with text mining technology which consists of five significant intelligent process-keyword searches, summarization, clustering, classification and topic tracking. Our paper endeavors to develop the best technical environment for preserving and using culture contents data. With the new digitalized upgraded settings, users of culture contents data will discover a world of new knowledge. The technology we introduce in this paper will lead to the highest achievable digital intelligence through a new framework.