• Title/Summary/Keyword: Network Modeling

Search Result 2,541, Processing Time 0.032 seconds

Improving Performance of Recommendation Systems Using Topic Modeling (사용자 관심 이슈 분석을 통한 추천시스템 성능 향상 방안)

  • Choi, Seongi;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.101-116
    • /
    • 2015
  • Recently, due to the development of smart devices and social media, vast amounts of information with the various forms were accumulated. Particularly, considerable research efforts are being directed towards analyzing unstructured big data to resolve various social problems. Accordingly, focus of data-driven decision-making is being moved from structured data analysis to unstructured one. Also, in the field of recommendation system, which is the typical area of data-driven decision-making, the need of using unstructured data has been steadily increased to improve system performance. Approaches to improve the performance of recommendation systems can be found in two aspects- improving algorithms and acquiring useful data with high quality. Traditionally, most efforts to improve the performance of recommendation system were made by the former approach, while the latter approach has not attracted much attention relatively. In this sense, efforts to utilize unstructured data from variable sources are very timely and necessary. Particularly, as the interests of users are directly connected with their needs, identifying the interests of the user through unstructured big data analysis can be a crew for improving performance of recommendation systems. In this sense, this study proposes the methodology of improving recommendation system by measuring interests of the user. Specially, this study proposes the method to quantify interests of the user by analyzing user's internet usage patterns, and to predict user's repurchase based upon the discovered preferences. There are two important modules in this study. The first module predicts repurchase probability of each category through analyzing users' purchase history. We include the first module to our research scope for comparing the accuracy of traditional purchase-based prediction model to our new model presented in the second module. This procedure extracts purchase history of users. The core part of our methodology is in the second module. This module extracts users' interests by analyzing news articles the users have read. The second module constructs a correspondence matrix between topics and news articles by performing topic modeling on real world news articles. And then, the module analyzes users' news access patterns and then constructs a correspondence matrix between articles and users. After that, by merging the results of the previous processes in the second module, we can obtain a correspondence matrix between users and topics. This matrix describes users' interests in a structured manner. Finally, by using the matrix, the second module builds a model for predicting repurchase probability of each category. In this paper, we also provide experimental results of our performance evaluation. The outline of data used our experiments is as follows. We acquired web transaction data of 5,000 panels from a company that is specialized to analyzing ranks of internet sites. At first we extracted 15,000 URLs of news articles published from July 2012 to June 2013 from the original data and we crawled main contents of the news articles. After that we selected 2,615 users who have read at least one of the extracted news articles. Among the 2,615 users, we discovered that the number of target users who purchase at least one items from our target shopping mall 'G' is 359. In the experiments, we analyzed purchase history and news access records of the 359 internet users. From the performance evaluation, we found that our prediction model using both users' interests and purchase history outperforms a prediction model using only users' purchase history from a view point of misclassification ratio. In detail, our model outperformed the traditional one in appliance, beauty, computer, culture, digital, fashion, and sports categories when artificial neural network based models were used. Similarly, our model outperformed the traditional one in beauty, computer, digital, fashion, food, and furniture categories when decision tree based models were used although the improvement is very small.

Spatio-Temporal Incidence Modeling and Prediction of the Vector-Borne Disease Using an Ecological Model and Deep Neural Network for Climate Change Adaption (기후 변화 적응을 위한 벡터매개질병의 생태 모델 및 심층 인공 신경망 기반 공간-시간적 발병 모델링 및 예측)

  • Kim, SangYoun;Nam, KiJeon;Heo, SungKu;Lee, SunJung;Choi, JiHun;Park, JunKyu;Yoo, ChangKyoo
    • Korean Chemical Engineering Research
    • /
    • v.58 no.2
    • /
    • pp.197-208
    • /
    • 2020
  • This study was carried out to analyze spatial and temporal incidence characteristics of scrub typhus and predict the future incidence of scrub typhus since the incidences of scrub typhus have been rapidly increased among vector-borne diseases. A maximum entropy (MaxEnt) ecological model was implemented to predict spatial distribution and incidence rate of scrub typhus using spatial data sets on environmental and social variables. Additionally, relationships between the incidence of scrub typhus and critical spatial data were analyzed. Elevation and temperature were analyzed as dominant spatial factors which influenced the growth environment of Leptotrombidium scutellare (L. scutellare) which is the primary vector of scrub typhus. A temporal number of diseases by scrub typhus was predicted by a deep neural network (DNN). The model considered the time-lagged effect of scrub typhus. The DNN-based prediction model showed that temperature, precipitation, and humidity in summer had significant influence factors on the activity of L. scutellare and the number of diseases at fall. Moreover, the DNN-based prediction model had superior performance compared to a conventional statistical prediction model. Finally, the spatial and temporal models were used under climate change scenario. The future characteristics of scrub typhus showed that the maximum incidence rate would increase by 8%, areas of the high potential of incidence rate would increase by 9%, and disease occurrence duration would expand by 2 months. The results would contribute to the disease management and prediction for the health of residents in terms of public health.

Classification of Domestic Freight Data and Application for Network Models in the Era of 'Government 3.0' ('정부 3.0' 시대를 맞이한 국내 화물 자료의 집계 수준에 따른 분류체계 구축 및 네트워크 모형 적용방안)

  • YOO, Han Sol;KIM, Nam Seok
    • Journal of Korean Society of Transportation
    • /
    • v.33 no.4
    • /
    • pp.379-392
    • /
    • 2015
  • Freight flow data in Korea has been collected for a variety of purposes by various organizations. However, since the representation and format of the data varies, it has not been substantially used for freight analyses and furthermore for freight policies. In order to increase the applicability of those data sets, it is required to bring them in a table and compare for finding the differences. Then, it is shown that the raw data can be aggregated by a particular criterion such as mode, origin and destination, and type commodity. This study aims to examine the freight data issue in terms of three different points of view. First, we investigated various freight volume data sets which are released by several organizations. Second, we tried to develop formulations for freight volume data. Third, we discussed how to apply the formulations to network models in which particular OR (Operations Research) techniques are used. The results emphasized that some data might be useless for modeling once they are aggregated. As a result of examining the freight volume data, this study found that 14 organizations share their data sets at various aggregation levels. This study is not an ordinary research article, which normally includes data analysis, because it seems to be impossible to conduct extensive case studies. The reason is that the data dealt in this study are diverse. Nevertheless, this study might guide the research direction in the freight transport research society in terms of data issue. Especially, it can be concluded that this study is a timely research because the governmemt has emphasized the importance of sharing data to public throughout 'government 3.0' for research purpose.

Personal Information Overload and User Resistance in the Big Data Age (빅데이터 시대의 개인정보 과잉이 사용자 저항에 미치는 영향)

  • Lee, Hwansoo;Lim, Dongwon;Zo, Hangjung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.125-139
    • /
    • 2013
  • Big data refers to the data that cannot be processes with conventional contemporary data technologies. As smart devices and social network services produces vast amount of data, big data attracts much attention from researchers. There are strong demands form governments and industries for bib data as it can create new values by drawing business insights from data. Since various new technologies to process big data introduced, academic communities also show much interest to the big data domain. A notable advance related to the big data technology has been in various fields. Big data technology makes it possible to access, collect, and save individual's personal data. These technologies enable the analysis of huge amounts of data with lower cost and less time, which is impossible to achieve with traditional methods. It even detects personal information that people do not want to open. Therefore, people using information technology such as the Internet or online services have some level of privacy concerns, and such feelings can hinder continued use of information systems. For example, SNS offers various benefits, but users are sometimes highly exposed to privacy intrusions because they write too much personal information on it. Even though users post their personal information on the Internet by themselves, the data sometimes is not under control of the users. Once the private data is posed on the Internet, it can be transferred to anywhere by a few clicks, and can be abused to create fake identity. In this way, privacy intrusion happens. This study aims to investigate how perceived personal information overload in SNS affects user's risk perception and information privacy concerns. Also, it examines the relationship between the concerns and user resistance behavior. A survey approach and structural equation modeling method are employed for data collection and analysis. This study contributes meaningful insights for academic researchers and policy makers who are planning to develop guidelines for privacy protection. The study shows that information overload on the social network services can bring the significant increase of users' perceived level of privacy risks. In turn, the perceived privacy risks leads to the increased level of privacy concerns. IF privacy concerns increase, it can affect users to from a negative or resistant attitude toward system use. The resistance attitude may lead users to discontinue the use of social network services. Furthermore, information overload is mediated by perceived risks to affect privacy concerns rather than has direct influence on perceived risk. It implies that resistance to the system use can be diminished by reducing perceived risks of users. Given that users' resistant behavior become salient when they have high privacy concerns, the measures to alleviate users' privacy concerns should be conceived. This study makes academic contribution of integrating traditional information overload theory and user resistance theory to investigate perceived privacy concerns in current IS contexts. There is little big data research which examined the technology with empirical and behavioral approach, as the research topic has just emerged. It also makes practical contributions. Information overload connects to the increased level of perceived privacy risks, and discontinued use of the information system. To keep users from departing the system, organizations should develop a system in which private data is controlled and managed with ease. This study suggests that actions to lower the level of perceived risks and privacy concerns should be taken for information systems continuance.

The inference about the cause of death of Korean Fir in Mt. Halla through the analysis of spatial dying pattern - Proposing the possibility of excess soil moisture by climate changes - (한라산 구상나무 공간적 고사패턴 분석을 통한 고사원인 추정 - 기후변화에 따른 토양수분 과다 가능성 제안 -)

  • Ahn, Ung San;Kim, Dae Sin;Yun, Young Seok;Ko, Suk Hyung;Kim, Kwon Su;Cho, In Sook
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.21 no.1
    • /
    • pp.1-28
    • /
    • 2019
  • This study analyzed the density and mortality rate of Korean fir at 9 sites where individuals of Korean firs were marked into the live and dead trees with coordinates on orthorectified aerial images by digital photogrammetric system. As a result of the analysis, Korean fir in each site showed considerable heterogeneity in density and mortality rate depending on the location within site. This make it possible to assume that death of Korean fir can occur by specific factors that vary depending on the location. Based on the analyzed densities and mortality rates of Korea fir, we investigated the correlation between topographic factors such as altitude, terrain slope, drainage network, solar radiation, aspect and the death of Korean fir. The density of Korean fir increases with altitude, and the mortality rate also increases. A negative correlation is found between the terrain slope and the mortality rate, and the mortality rate is higher in the gentle slope where the drainage network is less developed. In addition, it is recognized that depending on the aspect, the mortality rate varies greatly, and the mean solar radiation is higher in live Korean fir-dominant area than in dead Korean fir-dominant area. Overall, the mortality rate of Korean fir in Mt. Halla area is relatively higher in areas with relatively low terrain slope and low solar radiation. Considering the results of previous studies that the terrain slope has a strong negative correlation with soil moisture and the relationship between solar radiation and evaporation, these results lead us to infer that excess soil moisture is the cause of Korean fir mortality. These inferences are supported by a series of climate change phenomena such as precipitation increase, evaporation decrease, and reduced sunshine duration in the Korean peninsula including Jeju Island, increase in mortality rate along with increased precipitation according to the elevation of Mt. Halla and the vegetation change in the mountain. It is expected that the spatial patterns in the density and mortality rate of Korean fir, which are controlled by topography such as altitude, slope, aspect, solar radiation, drainage network, can be used as spatial variables in future numerical modeling studies on the death or decline of Korean fir. In addition, the method of forest distribution survey using the orthorectified aerial images can be widely used as a numerical monitoring technique in long - term vegetation change research.

Analyzing Different Contexts for Energy Terms through Text Mining of Online Science News Articles (온라인 과학 기사 텍스트 마이닝을 통해 분석한 에너지 용어 사용의 맥락)

  • Oh, Chi Yeong;Kang, Nam-Hwa
    • Journal of Science Education
    • /
    • v.45 no.3
    • /
    • pp.292-303
    • /
    • 2021
  • This study identifies the terms frequently used together with energy in online science news articles and topics of the news reports to find out how the term energy is used in everyday life and to draw implications for science curriculum and instruction about energy. A total of 2,171 online news articles in science category published by 11 major newspaper companies in Korea for one year from March 1, 2018 were selected by using energy as a search term. As a result of natural language processing, a total of 51,224 sentences consisting of 507,901 words were compiled for analysis. Using the R program, term frequency analysis, semantic network analysis, and structural topic modeling were performed. The results show that the terms with exceptionally high frequencies were technology, research, and development, which reflected the characteristics of news articles that report new findings. On the other hand, terms used more than once per two articles were industry-related terms (industry, product, system, production, market) and terms that were sufficiently expected as energy-related terms such as 'electricity' and 'environment.' Meanwhile, 'sun', 'heat', 'temperature', and 'power generation', which are frequently used in energy-related science classes, also appeared as terms belonging to the highest frequency. From a network analysis, two clusters were found including terms related to industry and technology and terms related to basic science and research. From the analysis of terms paired with energy, it was also found that terms related to the use of energy such as 'energy efficiency,' 'energy saving,' and 'energy consumption' were the most frequently used. Out of 16 topics found, four contexts of energy were drawn including 'high-tech industry,' 'industry,' 'basic science,' and 'environment and health.' The results suggest that the introduction of the concept of energy degradation as a starting point for energy classes can be effective. It also shows the need to introduce high-tech industries or the context of environment and health into energy learning.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (비정형 텍스트 분석을 활용한 이슈의 동적 변이과정 고찰)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.1-18
    • /
    • 2016
  • Owing to the extensive use of Web media and the development of the IT industry, a large amount of data has been generated, shared, and stored. Nowadays, various types of unstructured data such as image, sound, video, and text are distributed through Web media. Therefore, many attempts have been made in recent years to discover new value through an analysis of these unstructured data. Among these types of unstructured data, text is recognized as the most representative method for users to express and share their opinions on the Web. In this sense, demand for obtaining new insights through text analysis is steadily increasing. Accordingly, text mining is increasingly being used for different purposes in various fields. In particular, issue tracking is being widely studied not only in the academic world but also in industries because it can be used to extract various issues from text such as news, (SocialNetworkServices) to analyze the trends of these issues. Conventionally, issue tracking is used to identify major issues sustained over a long period of time through topic modeling and to analyze the detailed distribution of documents involved in each issue. However, because conventional issue tracking assumes that the content composing each issue does not change throughout the entire tracking period, it cannot represent the dynamic mutation process of detailed issues that can be created, merged, divided, and deleted between these periods. Moreover, because only keywords that appear consistently throughout the entire period can be derived as issue keywords, concrete issue keywords such as "nuclear test" and "separated families" may be concealed by more general issue keywords such as "North Korea" in an analysis over a long period of time. This implies that many meaningful but short-lived issues cannot be discovered by conventional issue tracking. Note that detailed keywords are preferable to general keywords because the former can be clues for providing actionable strategies. To overcome these limitations, we performed an independent analysis on the documents of each detailed period. We generated an issue flow diagram based on the similarity of each issue between two consecutive periods. The issue transition pattern among categories was analyzed by using the category information of each document. In this study, we then applied the proposed methodology to a real case of 53,739 news articles. We derived an issue flow diagram from the articles. We then proposed the following useful application scenarios for the issue flow diagram presented in the experiment section. First, we can identify an issue that actively appears during a certain period and promptly disappears in the next period. Second, the preceding and following issues of a particular issue can be easily discovered from the issue flow diagram. This implies that our methodology can be used to discover the association between inter-period issues. Finally, an interesting pattern of one-way and two-way transitions was discovered by analyzing the transition patterns of issues through category analysis. Thus, we discovered that a pair of mutually similar categories induces two-way transitions. In contrast, one-way transitions can be recognized as an indicator that issues in a certain category tend to be influenced by other issues in another category. For practical application of the proposed methodology, high-quality word and stop word dictionaries need to be constructed. In addition, not only the number of documents but also additional meta-information such as the read counts, written time, and comments of documents should be analyzed. A rigorous performance evaluation or validation of the proposed methodology should be performed in future works.

Analysis of Influential Factors in the Relationship between Innovation Efforts Based on the Company's Environment and Company Performance: Focus on Small and Medium-sized ICT Companies (기업의 환경적 특성에 따른 혁신활동과 기업성과간 영향요인 분석: ICT분야 중소기업을 중심으로)

  • Kim, Eun-jung;Roh, Doo-hwan;Park, Ho-young
    • Journal of Technology Innovation
    • /
    • v.25 no.4
    • /
    • pp.107-143
    • /
    • 2017
  • This study aims to understand the impact of internal and external environments and innovation efforts on a company's performance. First, the relationships and patterns between variables were determined through an exploratory factor analysis. Afterwards, a cluster analysis was conducted, in which the influential factors summarized in the factor analysis were classified. Finally, structural equation modeling was used to carry out an empirical analysis of the structural relationship between innovation efforts and the company's performance in the classified clusters. 7 factors were derived from the exploratory factor analysis of 40 input variables from external and internal environments. 4 clusters (n=1,022) were formed based on the 7 factors. Empirical analysis of the 4 clusters using structural equation modelling showed the following: Only independent technology development had a positive impact on the company's performance for Cluster 1, which is characterized by sensitivity to a technological/competitive environment and innovativeness. Only independent technology development and joint research had positive impacts on the company's performance for Cluster 2, which is characterized by sensitivity to a market environment and internal orientation. Joint research and the mediating variable of government support program utilization had positive impacts, while the introduction of technology had a negative impact on the company's performance for Cluster 3, which is characterized by sensitivity to a competitive environment, innovativeness, and willingness to cooperate with the government and related institutions. Independent technology development as well as the mediating variables of network utilization and government support program utilization had positive impacts on the company's performance for Cluster 4, which is characterized by openness and external cooperation.

A study of Artificial Intelligence (AI) Speaker's Development Process in Terms of Social Constructivism: Focused on the Products and Periodic Co-revolution Process (인공지능(AI) 스피커에 대한 사회구성 차원의 발달과정 연구: 제품과 시기별 공진화 과정을 중심으로)

  • Cha, Hyeon-ju;Kweon, Sang-hee
    • Journal of Internet Computing and Services
    • /
    • v.22 no.1
    • /
    • pp.109-135
    • /
    • 2021
  • his study classified the development process of artificial intelligence (AI) speakers through analysis of the news text of artificial intelligence (AI) speakers shown in traditional news reports, and identified the characteristics of each product by period. The theoretical background used in the analysis are news frames and topic frames. As analysis methods, topic modeling and semantic network analysis using the LDA method were used. The research method was a content analysis method. From 2014 to 2019, 2710 news related to AI speakers were first collected, and secondly, topic frames were analyzed using Nodexl algorithm. The result of this study is that, first, the trend of topic frames by AI speaker provider type was different according to the characteristics of the four operators (communication service provider, online platform, OS provider, and IT device manufacturer). Specifically, online platform operators (Google, Naver, Amazon, Kakao) appeared as a frame that uses AI speakers as'search or input devices'. On the other hand, telecommunications operators (SKT, KT) showed prominent frames for IPTV, which is the parent company's flagship business, and 'auxiliary device' of the telecommunication business. Furthermore, the frame of "personalization of products and voice service" was remarkable for OS operators (MS, Apple), and the frame for IT device manufacturers (Samsung) was "Internet of Things (IoT) Integrated Intelligence System". The econd, result id that the trend of the topic frame by AI speaker development period (by year) showed a tendency to develop around AI technology in the first phase (2014-2016), and in the second phase (2017-2018), the social relationship between AI technology and users It was related to interaction, and in the third phase (2019), there was a trend of shifting from AI technology-centered to user-centered. As a result of QAP analysis, it was found that news frames by business operator and development period in AI speaker development are socially constituted by determinants of media discourse. The implication of this study was that the evolution of AI speakers was found by the characteristics of the parent company and the process of co-evolution due to interactions between users by business operator and development period. The implications of this study are that the results of this study are important indicators for predicting the future prospects of AI speakers and presenting directions accordingly.

Estimation of irrigation return flow from paddy fields on agricultural watersheds (농업유역의 논 관개 회귀수량 추정)

  • Kim, Ha-Young;Nam, Won-Ho;Mun, Young-Sik;An, Hyun-Uk;Kim, Jonggun;Shin, Yongchul;Do, Jong-Won;Lee, Kwang-Ya
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.1
    • /
    • pp.1-10
    • /
    • 2022
  • Irrigation water supplied to the paddy field is consumed in the amount of evapotranspiration, underground infiltration, and natural and artificial drainage from the paddy field. Irrigation return flow is defined as the excess of irrigation water that is not consumed by evapotranspiration and crop, and which returns to an aquifer by infiltration or drainage. The research on estimating the return flow play an important part in water circulation management of agricultural watershed. However, the return flow rate calculations are needs because the result of calculating return flow is different depending on irrigation channel water loss, analysis methods, and local characteristics. In this study, the irrigation return flow rate of agricultural watershed was estimated using the monitoring and SWMM (Storm Water Management Model) modeling from 2017 to 2020 for the Heungeop reservoir located in Wonju, Gangwon-do. SWMM modeling was performed by weather data and observation data, water of supply and drainage were estimated as the result of SWMM model analysis. The applicability of the SWMM model was verified using RMSE and R-square values. The result of analysis from 2017 to 2020, the average annual quick return flow rate was 53.1%. Based on these results, the analysis of water circulation characteristics can perform, it can be provided as basic data for integrated water management.