• Title/Summary/Keyword: Network Mining

Search Result 1,036, Processing Time 0.031 seconds

Improving Performance of Recommendation Systems Using Topic Modeling (사용자 관심 이슈 분석을 통한 추천시스템 성능 향상 방안)

  • Choi, Seongi;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.101-116
    • /
    • 2015
  • Recently, due to the development of smart devices and social media, vast amounts of information with the various forms were accumulated. Particularly, considerable research efforts are being directed towards analyzing unstructured big data to resolve various social problems. Accordingly, focus of data-driven decision-making is being moved from structured data analysis to unstructured one. Also, in the field of recommendation system, which is the typical area of data-driven decision-making, the need of using unstructured data has been steadily increased to improve system performance. Approaches to improve the performance of recommendation systems can be found in two aspects- improving algorithms and acquiring useful data with high quality. Traditionally, most efforts to improve the performance of recommendation system were made by the former approach, while the latter approach has not attracted much attention relatively. In this sense, efforts to utilize unstructured data from variable sources are very timely and necessary. Particularly, as the interests of users are directly connected with their needs, identifying the interests of the user through unstructured big data analysis can be a crew for improving performance of recommendation systems. In this sense, this study proposes the methodology of improving recommendation system by measuring interests of the user. Specially, this study proposes the method to quantify interests of the user by analyzing user's internet usage patterns, and to predict user's repurchase based upon the discovered preferences. There are two important modules in this study. The first module predicts repurchase probability of each category through analyzing users' purchase history. We include the first module to our research scope for comparing the accuracy of traditional purchase-based prediction model to our new model presented in the second module. This procedure extracts purchase history of users. The core part of our methodology is in the second module. This module extracts users' interests by analyzing news articles the users have read. The second module constructs a correspondence matrix between topics and news articles by performing topic modeling on real world news articles. And then, the module analyzes users' news access patterns and then constructs a correspondence matrix between articles and users. After that, by merging the results of the previous processes in the second module, we can obtain a correspondence matrix between users and topics. This matrix describes users' interests in a structured manner. Finally, by using the matrix, the second module builds a model for predicting repurchase probability of each category. In this paper, we also provide experimental results of our performance evaluation. The outline of data used our experiments is as follows. We acquired web transaction data of 5,000 panels from a company that is specialized to analyzing ranks of internet sites. At first we extracted 15,000 URLs of news articles published from July 2012 to June 2013 from the original data and we crawled main contents of the news articles. After that we selected 2,615 users who have read at least one of the extracted news articles. Among the 2,615 users, we discovered that the number of target users who purchase at least one items from our target shopping mall 'G' is 359. In the experiments, we analyzed purchase history and news access records of the 359 internet users. From the performance evaluation, we found that our prediction model using both users' interests and purchase history outperforms a prediction model using only users' purchase history from a view point of misclassification ratio. In detail, our model outperformed the traditional one in appliance, beauty, computer, culture, digital, fashion, and sports categories when artificial neural network based models were used. Similarly, our model outperformed the traditional one in beauty, computer, digital, fashion, food, and furniture categories when decision tree based models were used although the improvement is very small.

A Study on the Revitalization of Tourism Industry through Big Data Analysis (한국관광 실태조사 빅 데이터 분석을 통한 관광산업 활성화 방안 연구)

  • Lee, Jungmi;Liu, Meina;Lim, Gyoo Gun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.149-169
    • /
    • 2018
  • Korea is currently accumulating a large amount of data in public institutions based on the public data open policy and the "Government 3.0". Especially, a lot of data is accumulated in the tourism field. However, the academic discussions utilizing the tourism data are still limited. Moreover, the openness of the data of restaurants, hotels, and online tourism information, and how to use SNS Big Data in tourism are still limited. Therefore, utilization through tourism big data analysis is still low. In this paper, we tried to analyze influencing factors on foreign tourists' satisfaction in Korea through numerical data using data mining technique and R programming technique. In this study, we tried to find ways to revitalize the tourism industry by analyzing about 36,000 big data of the "Survey on the actual situation of foreign tourists from 2013 to 2015" surveyed by the Korea Culture & Tourism Research Institute. To do this, we analyzed the factors that have high influence on the 'Satisfaction', 'Revisit intention', and 'Recommendation' variables of foreign tourists. Furthermore, we analyzed the practical influences of the variables that are mentioned above. As a procedure of this study, we first integrated survey data of foreign tourists conducted by Korea Culture & Tourism Research Institute, which is stored in the tourist information system from 2013 to 2015, and eliminate unnecessary variables that are inconsistent with the research purpose among the integrated data. Some variables were modified to improve the accuracy of the analysis. And we analyzed the factors affecting the dependent variables by using data-mining methods: decision tree(C5.0, CART, CHAID, QUEST), artificial neural network, and logistic regression analysis of SPSS IBM Modeler 16.0. The seven variables that have the greatest effect on each dependent variable were derived. As a result of data analysis, it was found that seven major variables influencing 'overall satisfaction' were sightseeing spot attraction, food satisfaction, accommodation satisfaction, traffic satisfaction, guide service satisfaction, number of visiting places, and country. Variables that had a great influence appeared food satisfaction and sightseeing spot attraction. The seven variables that had the greatest influence on 'revisit intention' were the country, travel motivation, activity, food satisfaction, best activity, guide service satisfaction and sightseeing spot attraction. The most influential variables were food satisfaction and travel motivation for Korean style. Lastly, the seven variables that have the greatest influence on the 'recommendation intention' were the country, sightseeing spot attraction, number of visiting places, food satisfaction, activity, tour guide service satisfaction and cost. And then the variables that had the greatest influence were the country, sightseeing spot attraction, and food satisfaction. In addition, in order to grasp the influence of each independent variables more deeply, we used R programming to identify the influence of independent variables. As a result, it was found that the food satisfaction and sightseeing spot attraction were higher than other variables in overall satisfaction and had a greater effect than other influential variables. Revisit intention had a higher ${\beta}$ value in the travel motive as the purpose of Korean Wave than other variables. It will be necessary to have a policy that will lead to a substantial revisit of tourists by enhancing tourist attractions for the purpose of Korean Wave. Lastly, the recommendation had the same result of satisfaction as the sightseeing spot attraction and food satisfaction have higher ${\beta}$ value than other variables. From this analysis, we found that 'food satisfaction' and 'sightseeing spot attraction' variables were the common factors to influence three dependent variables that are mentioned above('Overall satisfaction', 'Revisit intention' and 'Recommendation'), and that those factors affected the satisfaction of travel in Korea significantly. The purpose of this study is to examine how to activate foreign tourists in Korea through big data analysis. It is expected to be used as basic data for analyzing tourism data and establishing effective tourism policy. It is expected to be used as a material to establish an activation plan that can contribute to tourism development in Korea in the future.

A Convergence Study in the Severity-adjusted Mortality Ratio on inpatients with multiple chronic conditions (복합만성질환 입원환자의 중증도 보정 사망비에 대한 융복합 연구)

  • Seo, Young-Suk;Kang, Sung-Hong
    • Journal of Digital Convergence
    • /
    • v.13 no.12
    • /
    • pp.245-257
    • /
    • 2015
  • This study was to develop the predictive model for severity-adjusted mortality of inpatients with multiple chronic conditions and analyse the factors on the variation of hospital standardized mortality ratio(HSMR) to propose the plan to reduce the variation. We collect the data "Korean National Hospital Discharge In-depth Injury Survey" from 2008 to 2010 and select the final 110,700 objects of study who have chronic diseases for principal diagnosis and who are over the age of 30 with more than 2 chronic diseases including principal diagnosis. We designed a severity-adjusted mortality predictive model with using data-mining methods (logistic regression analysis, decision tree and neural network method). In this study, we used the predictive model for severity-adjusted mortality ratio by the decision tree using Elixhauser comorbidity index. As the result of the hospital standardized mortality ratio(HSMR) of inpatients with multiple chronic conditions, there were statistically significant differences in HSMR by the insurance type, bed number of hospital, and the location of hospital. We should find the method based on the result of this study to manage mortality ratio of inpatients with multiple chronic conditions efficiently as the national level. So we should make an effort to increase the quality of medical treatment for inpatients with multiple chronic diseases and to reduce growing medical expenses.

Monitoring of Tidal Sand Shoal with a Camera Monitoring System and its Morphologic Change (카메라를 활용한 조석사주 관측시스템 구축 및 지형변화)

  • Lee, Soong-Ji;Lee, Guan-Hong;Kang, Tae-Soon;Kim, Young-Taeg;Kim, Tea-Lim
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.39 no.3
    • /
    • pp.326-333
    • /
    • 2015
  • A tidal sandshoal, called 'Puldeung' in the Daeijackdo Marine Protected Area(DMPA), is facing erosion due to sand mining in the nearby coastal region. To monitor the morphologic change and erosion of Puldeung, a camera monitoring system was established at the top of Song-Ee Mountain in Daeijack Island. The system consists of 2 Cannon digital cameras, Eye-fi memory card/Long-Term Evolution wireless network, and solar power supply. The acquired camera images were analyzed to obtain the area of Puldeung by the following methods: geometric correction of image, identification of shoreline, areal measurement of Puldeung and its error estimation. To compare the Puldeung area with previously measured area of 1.79 km2 at tidal height of 137 cm in 2008 and of 1.59 km2 at tidal height of 148 cm in 2010, we selected images with same tidal heights. The Puldeung area was 1.37 and 1.23 km2 at the tidal height of 137 and 148 cm, respectively. The erosion at DMPA is very severe and thus it is imperative to initiate the morphodynamical study on the seasonal variation and long-term evolution of Puldeung as well as the causes and measures of Puldeung erosion.

Identification of growth trait related genes in a Yorkshire purebred pig population by genome-wide association studies

  • Meng, Qingli;Wang, Kejun;Liu, Xiaolei;Zhou, Haishen;Xu, Li;Wang, Zhaojun;Fang, Meiying
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.30 no.4
    • /
    • pp.462-469
    • /
    • 2017
  • Objective: The aim of this study is to identify genomic regions or genes controlling growth traits in pigs. Methods: Using a panel of 54,148 single nucleotide polymorphisms (SNPs), we performed a genome-wide Association (GWA) study in 562 pure Yorshire pigs with four growth traits: average daily gain from 30 kg to 100 kg or 115 kg, and days to 100 kg or 115 kg. Fixed and random model Circulating Probability Unification method was used to identify the associations between 54,148 SNPs and these four traits. SNP annotations were performed through the Sus scrofa data set from Ensembl. Bioinformatics analysis, including gene ontology analysis, pathway analysis and network analysis, was used to identify the candidate genes. Results: We detected 6 significant and 12 suggestive SNPs, and identified 9 candidate genes in close proximity to them (suppressor of glucose by autophagy [SOGA1], R-Spondin 2 [RSPO2], mitogen activated protein kinase kinase 6 [MAP2K6], phospholipase C beta 1 [PLCB1], rho GTPASE activating protein 24 [ARHGAP24], cytoplasmic polyadenylation element binding protein 4 [CPEB4], GLI family zinc finger 2 [GLI2], neuronal tyrosine-phosphorylated phosphoinositide-3-kinase adaptor 2 [NYAP2], and zinc finger protein multitype 2 [ZFPM2]). Gene ontology analysis and literature mining indicated that the candidate genes are involved in bone, muscle, fat, and lung development. Pathway analysis revealed that PLCB1 and MAP2K6 participate in the gonadotropin signaling pathway and suggests that these two genes contribute to growth at the onset of puberty. Conclusion: Our results provide new clues for understanding the genetic mechanisms underlying growth traits, and may help improve these traits in future breeding programs.

'Elderly image' Analysis Using Big Data and Social Networking Techniques (빅데이터와 사회연결망 기법을 이용한 '노인 이미지' 분석)

  • Han, Sun-Bo;Lee, Hyun-Sim
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.11
    • /
    • pp.253-263
    • /
    • 2016
  • We analyzed the social issue 'image of the elderly' using Big Data and Social Network Analysis. First, we analyzed the words extracted by the text mining technique by inputting the keyword 'elderly'. As a result of analysis, the image of the elderly viewed through media such as cafes, blogs, etc. Representing the trend of the public was using the word 'Senior' the most. The image of the elderly is expressed using the word having the highest frequency in the top 10, "The elderly are 'Senior' people who are respected by society, they are organized to earn money, to earn their qualifications, to health, and to 'Seniors' who desire to work healthy up to 100 years old". The purpose of this study is to differentiate from the existing analysis method by analyzing the macro-level image of the elderly including the social discourse by collecting vast amount of data and analyzing it with the social networking technique. When the image of the elderly that the public perceives is positively expressed as 'Senior', it can be said that the direction of the current elderly policy is evaluated as a desirable direction. On the other hand, it was able to feel the 'desire' of the public who wanted to be evaluated. Therefore, the policy direction of the elderly to be applied in the future should be the policy that enables the elderly to be perceived as 'Necessary existence' in society by taking on social roles. In addition, we proposed to implement the policy of the elderly that reflects priorities such as job creation, welfare, and alienation that can activity and maintain health.

Relationship between Diurnal Patterns of Transit Ridership and Land Use in the Metropolitan Seoul Area (서울 대도시권 하루 시간대별 지하철 통행흐름 패턴과 토지이용과의 관계)

  • Lee, Keum-Sook;Song, Ye-Na;Park, Jong-Soo;Anderson, William P.
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.15 no.1
    • /
    • pp.26-41
    • /
    • 2012
  • This study investigates the time-space characteristics of intra-urban passenger flows in the Metropolitan Seoul area. In particular, we analyze the relationships between transit ridership and land use through the use of the subway passenger flow data obtained from the transit transaction databases. For this purpose, the strength of each subway station, i.e., the number of total in-coming and out-going passengers at each station, in the morning, afternoon, and evening, is calculated and visualized, which reflects urban land use patterns. Then the subway stations are classified into four groups via a hierarchical analysis of the in-coming and out-going passenger flows at 353 stations. Each group appears to have characteristic properties according to the region, e.g., residential areas and central business districts. This has been confirmed by the analysis which probes explicitly the relationship between the local socio-economic variables and station groups. This analysis, disclosing the inter-relationship between the subway network and urban land use, may be useful at various stages in urban as well as transportation planning, and provides analytical tools for a wide spectrum of applications ranging from impact evaluation to decision-making and planning support.

  • PDF

Stock Price Prediction Using Sentiment Analysis: from "Stock Discussion Room" in Naver (SNS감성 분석을 이용한 주가 방향성 예측: 네이버 주식토론방 데이터를 이용하여)

  • Kim, Myeongjin;Ryu, Jihye;Cha, Dongho;Sim, Min Kyu
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.4
    • /
    • pp.61-75
    • /
    • 2020
  • The scope of data for understanding or predicting stock prices has been continuously widened from traditional structured format data to unstructured data. This study investigates whether commentary data collected from SNS may affect future stock prices. From "Stock Discussion Room" in Naver, we collect 20 stocks' commentary data for six months, and test whether this data have prediction power with respect to one-hour ahead price direction and price range. Deep neural network such as LSTM and CNN methods are employed to model the predictive relationship. Among the 20 stocks, we find that future price direction can be predicted with higher than the accuracy of 50% in 13 stocks. Also, the future price range can be predicted with higher than the accuracy of 50% in 16 stocks. This study validate that the investors' sentiment reflected in SNS community such as Naver's "Stock Discussion Room" may affect the demand and supply of stocks, thus driving the stock prices.

A Study on the Research Trends for Smart City using Topic Modeling (토픽 모델링을 활용한 스마트시티 연구동향 분석)

  • Park, Keon Chul;Lee, Chi Hyung
    • Journal of Internet Computing and Services
    • /
    • v.20 no.3
    • /
    • pp.119-128
    • /
    • 2019
  • This study aims to analyze the research trends on Smart City and to present implications to policy maker, industry professional, and researcher. Cities around globe have undergone the rapid progress in urbanization and the consequent dramatic increase in urban dwellings over the past few decades, and faced many urban problems in such areas as transportation, environment and housing. Cities around the globe are in a hurry to introduce Smart City to pursue a common goal of solving these urban problems and improving the quality of their lives. However, various conceptual approaches to smart city are causing uncertainty in setting policy goals and establishing direction for implementation. The study collected 11,527 papers titled "Smart City(cities)" from the Scopus DB and Springer DB, and then analyze research status, topic, trends based on abstracts and publication date(year) information using the LDA based Topic Modeling approaches. Research topics are classified into three categories(Services, Technologies, and User Perspective) and eight regarding topics. Out of eight topics, citizen-driven innovation is the most frequently referred. Additional topic network analysis reveals that data and privacy/security are the most prevailing topics affecting others. This study is expected to helps understand the trends of Smart City researches and predict the future researches.

Analysis of Technology Association Rules Between CPC Codes of the 'Internet of Things(IoT)' Patent (CPC 코드 기반 사물인터넷(IoT) 특허의 기술 연관성 규칙 분석)

  • Shim, Jaeruen
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.12 no.5
    • /
    • pp.493-498
    • /
    • 2019
  • This study deals with the analysis of the technology association rules between CPC codes of the Internet of Things(IoT) patent, the core of the Fourth Industrial Revolution ICT-based technology. The association rules between CPC codes were extracted using R, an open source for data mining. To this end, we analyzed 369 of the 605 patents related to the Internet of Things filed with the Patent Office until July 2019, with a complex CPC code, up to the subclass-level. As a result of the technology association rules, CPC codes with high support were [H04W ${\rightarrow}$ H04L](18.2%), [H04L ${\rightarrow}$ H04W](18.2%), [G06Q ${\rightarrow}$ H04L](17.3%), [H04L ${\rightarrow}$ G06Q](17.3%), [H04W ${\rightarrow}$ G06Q](9.8%), [G06Q ${\rightarrow}$ H04W](9.8%), [G06F ${\rightarrow}$ H04L](7.9%), [H04L ${\rightarrow}$ G06F](7.9%), [G06F ${\rightarrow}$ G06Q](6.2%), [G06Q ${\rightarrow}$ G06F](6.2%). After analyzing the technology interconnection network, the core CPC codes related to technology association rules are G06Q and H04L. The results of this study can be used to predict future patent trends.