• Title/Summary/Keyword: big data mining

Search Result 687, Processing Time 0.028 seconds

Spatial analysis based on topic modeling using foreign tourist review data: Case of Daegu (외국인 관광객 리뷰데이터를 활용한 토픽모델링 기반의 공간분석: 대구광역시를 사례로)

  • Jung, Ji-Woo;Kim, Seo-Yun;Kim, Hyeon-Yu;Yoon, Ju-Hyeok;Jang, Won-Jun;Kim, Keun-Wook
    • Journal of Digital Convergence
    • /
    • v.19 no.8
    • /
    • pp.33-42
    • /
    • 2021
  • As smartphone-based tourism platforms have become active, policy establishment and service enhancement using review data are being made in various fields. In the case of the preceding studies using tourism review data, most of the studies centered on domestic tourists were conducted, and in the case of foreign tourist studies, studies were conducted only on data collected in some languages and text mining techniques. In this study, 3,515 review data written by foreigners were collected by designating the "Daegu attractions" keyword through the online review site. And LDA-based topic modeling was performed to derive tourism topics. The spatial approach through global and local spatial autocorrelation analysis for each topic can be said to be different from previous studies. As a result of the analysis, it was confirmed that there is a global spatial autocorrelation, and that tourist destinations mainly visited by foreigners are concentrated locally. In addition, hot spots have been drawn around Jung-gu in most of the topics. Based on the analysis results, it is expected to be used as a basic research for spatial analysis based on local government foreign tourism policy establishment and topic modeling. And The limitations of this study were also presented.

Forecasting of Motorway Path Travel Time by Using DSRC and TCS Information (DSRC와 TCS 정보를 이용한 고속도로 경로통행시간 예측)

  • Chang, Hyun-ho;Yoon, Byoung-jo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.37 no.6
    • /
    • pp.1033-1041
    • /
    • 2017
  • Path travel time based on departure time (PTTDP) is key information in advanced traveler information systems (ATIS). Despite the necessity, forecasting PTTDP is still one of challenges which should be successfully conquered in the forecasting area of intelligent transportation systems (ITS). To address this problem effectively, a methodology to dynamically predict PTTDP between motorway interchanges is proposed in this paper. The method was developed based on the relationships between traffic demands at motorway tollgates and PTTDPs between TGs in the motorway network. Two different data were used as the input of the model: traffic demand data and path travel time data are collected by toll collection system (TCS) and dedicated short range communication (DSRC), respectively. The proposed model was developed based on k-nearest neighbor, one of data mining techniques, in order for the real applications of motorway information systems. In a feasible test with real-world data, the proposed method performed effectively by means of prediction reliability and computational running time to the level of real application of current ATIS.

Spatial Clustering Analysis based on Text Mining of Location-Based Social Media Data (위치기반 소셜 미디어 데이터의 텍스트 마이닝 기반 공간적 클러스터링 분석 연구)

  • Park, Woo Jin;Yu, Ki Yun
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.23 no.2
    • /
    • pp.89-96
    • /
    • 2015
  • Location-based social media data have high potential to be used in various area such as big data, location based services and so on. In this study, we applied a series of analysis methodology to figure out how the important keywords in location-based social media are spatially distributed by analyzing text information. For this purpose, we collected tweet data with geo-tag in Gangnam district and its environs in Seoul for a month of August 2013. From this tweet data, principle keywords are extracted. Among these, keywords of three categories such as food, entertainment and work and study are selected and classified by category. The spatial clustering is conducted to the tweet data which contains keywords in each category. Clusters of each category are compared with buildings and benchmark POIs in the same position. As a result of comparison, clusters of food category showed high consistency with commercial areas of large scale. Clusters of entertainment category corresponded with theaters and sports complex. Clusters of work and study showed high consistency with areas where private institutes and office buildings are concentrated.

Exploration of relationship between confirmation measures and association thresholds (기준 확인 측도와 연관성 평가기준과의 관계 탐색)

  • Park, Hee Chang
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.4
    • /
    • pp.835-845
    • /
    • 2013
  • Association rule of data mining techniques is the method to quantify the relevance between a set of items in a big database, andhas been applied in various fields like manufacturing industry, shopping mall, healthcare, insurance, and education. Philosophers of science have proposed interestingness measures for various kinds of patterns, analyzed their theoretical properties, evaluated them empirically, and suggested strategies to select appropriate measures for particular domains and requirements. Such interestingness measures are divided into objective, subjective, and semantic measures. Objective measures are based on data used in the discovery process and are typically motivated by statistical considerations. Subjective measures take into account not only the data but also the knowledge and interests of users who examine the pattern, while semantic measures additionally take into account utility and actionability. In a very different context, researchers have devoted a lot of attention to measures of confirmation or evidential support. The focus in this paper was on asymmetric confirmation measures, and we compared confirmation measures with basic association thresholds using some simulation data. As the result, we could distinguish the direction of association rule by confirmation measures, and interpret degree of association operationally by them. Futhermore, the result showed that the measure by Rips and that by Kemeny and Oppenheim were better than other confirmation measures.

Case Study of Big Data-Based Agri-food Recommendation System According to Types of Customers (빅데이터 기반 소비자 유형별 농식품 추천시스템 구축 사례)

  • Moon, Junghoon;Jang, Ikhoon;Choe, Young Chan;Kim, Jin Gyo;Bock, Gene
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.40 no.5
    • /
    • pp.903-913
    • /
    • 2015
  • The Korea Agency of Education, Promotion and Information Service in Food, Agriculture, Forestry and Fisheries launched a public data portal service in January 2015. The service provides customized information for consumers through an agri-food recommendation system built-in portal service. The recommendation system has fallowing characteristics. First, the system can increase recommendation accuracy by using a wide variety of agri-food related data, including SNS opinion mining, consumer's purchase data, climate data, and wholesale price data. Second, the system uses segmentation method based on consumer's lifestyle and megatrends factors to overcome the cold start problem. Third, the system recommends agri-foods to users reflecting various preference contextual factors by using recommendation algorithm, dirichlet-multinomial distribution. In addition, the system provides diverse information related to recommended agri-foods to increase interest in agri-food of service users.

The World as Seen from Venice (1205-1533) as a Case Study of Scalable Web-Based Automatic Narratives for Interactive Global Histories

  • NANETTI, Andrea;CHEONG, Siew Ann
    • Asian review of World Histories
    • /
    • v.4 no.1
    • /
    • pp.3-34
    • /
    • 2016
  • This introduction is both a statement of a research problem and an account of the first research results for its solution. As more historical databases come online and overlap in coverage, we need to discuss the two main issues that prevent 'big' results from emerging so far. Firstly, historical data are seen by computer science people as unstructured, that is, historical records cannot be easily decomposed into unambiguous fields, like in population (birth and death records) and taxation data. Secondly, machine-learning tools developed for structured data cannot be applied as they are for historical research. We propose a complex network, narrative-driven approach to mining historical databases. In such a time-integrated network obtained by overlaying records from historical databases, the nodes are actors, while thelinks are actions. In the case study that we present (the world as seen from Venice, 1205-1533), the actors are governments, while the actions are limited to war, trade, and treaty to keep the case study tractable. We then identify key periods, key events, and hence key actors, key locations through a time-resolved examination of the actions. This tool allows historians to deal with historical data issues (e.g., source provenance identification, event validation, trade-conflict-diplomacy relationships, etc.). On a higher level, this automatic extraction of key narratives from a historical database allows historians to formulate hypotheses on the courses of history, and also allow them to test these hypotheses in other actions or in additional data sets. Our vision is that this narrative-driven analysis of historical data can lead to the development of multiple scale agent-based models, which can be simulated on a computer to generate ensembles of counterfactual histories that would deepen our understanding of how our actual history developed the way it did. The generation of such narratives, automatically and in a scalable way, will revolutionize the practice of history as a discipline, because historical knowledge, that is the treasure of human experiences (i.e. the heritage of the world), will become what might be inherited by machine learning algorithms and used in smart cities to highlight and explain present ties and illustrate potential future scenarios and visionarios.

A study on the analysis of customer loan for the credit finance company using classification model (분류모형을 이용한 여신회사 고객대출 분석에 관한 연구)

  • Kim, Tae-Hyung;Kim, Yeong-Hwa
    • Journal of the Korean Data and Information Science Society
    • /
    • v.24 no.3
    • /
    • pp.411-425
    • /
    • 2013
  • The importance and necessity of the credit loan are increasing over time. Also, it is a natural consequence that the increase of the risk for borrower increases the risk of non-performing loan. Thus, we need to predict accurately in order to prevent the loss of a credit loan company. Our final goal is to build reliable and accurate prediction model, so we proceed the following steps: At first, we can get an appropriate sample by using several resampling methods. Second, we can consider variety models and tools to fit our resampling data. Finally, in order to find the best model for our real data, various models were compared and assessed.

A Comprehensive Framework for Estimating Pedestrian OD Matrix Using Spatial Information and Integrated Smart Card Data (공간정보와 통합 스마트카드 자료를 활용한 도시철도 역사 보행 기종점 분석 기법 개발)

  • JEONG, Eunbi;YOU, Soyoung Iris;LEE, Jun;KIM, Kyoungtae
    • Journal of Korean Society of Transportation
    • /
    • v.35 no.5
    • /
    • pp.409-422
    • /
    • 2017
  • TOD (Transit-Oriented Development) is one of the urban structure concentrated on the multifunctional space/district with public transportation system, which is introduced for maintaining sustainable future cities. With such trends, the project of building complex transferring centers located at a urban railway station has widely been spreaded and a comprehensive and systematic analytical framework is required to clarify and readily understand the complicated procedure of estimation with the large scale of the project. By doing so, this study is to develop a comprehensive analytical framework for estimating a pedestrian OD matrix using a spatial information and an integrated smart card data, which is so called a data depository and it has been applied to the Samseong station for the model validation. The proposed analytical framework contributes on providing a chance to possibly extend with digitalized and automated data collection technologies and a BigData mining methods.

Discovering Interdisciplinary Convergence Technologies Using Content Analysis Technique Based on Topic Modeling (토픽 모델링 기반 내용 분석을 통한 학제 간 융합기술 도출 방법)

  • Jeong, Do-Heon;Joo, Hwang-Soo
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.3
    • /
    • pp.77-100
    • /
    • 2018
  • The objectives of this study is to present a discovering process of interdisciplinary convergence technology using text mining of big data. For the convergence research of biotechnology(BT) and information communications technology (ICT), the following processes were performed. (1) Collecting sufficient meta data of research articles based on BT terminology list. (2) Generating intellectual structure of emerging technologies by using a Pathfinder network scaling algorithm. (3) Analyzing contents with topic modeling. Next three steps were also used to derive items of BT-ICT convergence technology. (4) Expanding BT terminology list into superior concepts of technology to obtain ICT-related information from BT. (5) Automatically collecting meta data of research articles of two fields by using OpenAPI service. (6) Analyzing contents of BT-ICT topic models. Our study proclaims the following findings. Firstly, terminology list can be an important knowledge base for discovering convergence technologies. Secondly, the analysis of a large quantity of literature requires text mining that facilitates the analysis by reducing the dimension of the data. The methodology we suggest here to process and analyze data is efficient to discover technologies with high possibility of interdisciplinary convergence.

A Morphological Analysis Method of Predicting Place-Event Performance by Online News Titles (온라인 뉴스 제목 분석을 통한 특정 장소 이벤트 성과 예측을 위한 형태소 분석 방법)

  • Choi, Sukjae;Lee, Jaewoong;Kwon, Ohbyung
    • The Journal of Society for e-Business Studies
    • /
    • v.21 no.1
    • /
    • pp.15-32
    • /
    • 2016
  • Online news on the Internet, as published open data, contain facts or opinions about a specific affair and hence influences considerably on the decisions of the general publics who are interested in a particular issue. Therefore, we can predict the people's choices related with the issue by analyzing a large number of related internet news. This study aims to propose a text analysis methodto predict the outcomes of events that take place in a specific place. We used topics of the news articles because the topics contains more essential text than the news articles. Moreover, when it comes to mobile environment, people tend to rely more on the news topics before clicking into the news articles. We collected the titles of news articles and divided them into the learning and evaluation data set. Morphemes are extracted and their polarity values are identified with the learning data. Then we analyzed the sensitivity of the entire articles. As a result, the prediction success rate was 70.6% and it showed a clear difference with other analytical methods to compare. Derived prediction information will be helpful in determining the expected demand of goods when preparing the event.