• Title/Summary/Keyword: Big Data Based Modeling

Search Result 182, Processing Time 0.026 seconds

Adaptive User and Topic Modeling based Automatic TV Recommender System for Big Data Processing (빅 데이터 처리를 위한 적응적 사용자 및 토픽 모델링 기반 자동 TV 프로그램 추천시스템)

  • Kim, EunHui;Kim, Munchurl
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2015.07a
    • /
    • pp.195-198
    • /
    • 2015
  • 최근 TV 서비스의 가입자 및 TV 프로그램 콘텐츠의 급격한 증가에 따라 빅데이터 처리에 적합한 추천 시스템의 필요성이 증가하고 있다. 본 논문은 사용자들의 간접 평가 데이터 기반의 추천 시스템 디자인 시, 누적된 사용자의 과거 이용내역 데이터를 저장하지 않고 새로 생성된 사용자 이용내역 데이터를 학습하는 효율적인 알고리즘이면서, 시간 흐름에 따라 사용자들의 선호도 변화 및 TV 프로그램 스케줄 변화의 추적이 가능한 토픽 모델링 기반의 알고리즘을 제안한다. 빅데이터 처리를 위해서는 분산처리 형태의 알고리즘을 피할 수 없는데, 기존의 연구들 중 토픽 모델링 기반의 추론 알고리즘의 병렬분산처리 과정 중에 핵심이 되는 부분은 많은 데이터를 여러 대의 기계에 나누어 병렬분산 학습하면서 전역변수 데이터를 동기화하는 부분이다. 그런데, 이러한 전역데이터 동기화 기술에 있어, 여러 대의 컴퓨터를 병렬분산처리하기위한 하둡 기반의 시스템 및 서버-클라이언트간의 중재, 고장 감내 시스템 등을 모두 고려한 알고리즘들이 제안되어 왔으나, 네트워크 대역폭 한계로 인해 데이터 증가에 따른 동기화 시간 지연은 피할 수 없는 부분이다. 이에, 본 논문에서는 빅데이터 처리를 위해 사용자들을 클러스터링하고, 클러스터별 제안 알고리즘으로 전역데이터 동기화를 수행한 것과 지역 데이터를 활용하여 추론 연산한 결과, 클러스터별 지역별 TV프로그램 시청 토큰 별 은닉토픽 할당 테이블을 유지할 때 추천 성능이 더욱 향상되어 나오는 결과를 확인하여, 제안된 구조의 추천 시스템 디자인의 효율성과 합리성을 확인할 수 있었다.

  • PDF

Applying Multi-Response Optimization to Explore Fermentation Conditions of Probiotics (프로바이오틱 유산균 발효조건 탐색을 위한 다반응 최적화의 활용)

  • Sungsue Rheem
    • Journal of Dairy Science and Biotechnology
    • /
    • v.41 no.2
    • /
    • pp.45-56
    • /
    • 2023
  • This review serves two purposes: first, to promote the use of improved optimization techniques in response surface methodology (RSM); and second, to enhance the optimum conditions for the fermentation of probiotics. According to research in dairy science, Lactiplantibacillus plantarum K79 is a candidate probiotic that has beneficial health effects, such as lowering blood pressure. The optimum conditions for L. plantarumK79 to produce peptides with angiotensin-converting enzyme (ACE) inhibitory activity were proposed, through modeling each of ACE inhibitory activity and pH as a function of the four factors that are skim milk concentration (%), incubation temperature (℃), incubation time (hours), and starter added amount (%). To estimate optimum conditions, the researchers employed a desirability-based multi-response optimization approach, utilizing third-order models with a nonsignificant lack of fit. The estimated optimum fermentation conditions for L. plantarum K79 were as follows: a skim milk concentration of 10.76%, an incubation temperature of 36.9℃, an incubation time of 23.76 hours, and a starter added amount of 0.098%. Under these conditions, the predicted ACE inhibitory activity was 91.047%, and the predicted pH was 4.6. These predicted values achieved the objectives of the multi-response optimization in this study.

An Analytical Approach Using Topic Mining for Improving the Service Quality of Hotels (호텔 산업의 서비스 품질 향상을 위한 토픽 마이닝 기반 분석 방법)

  • Moon, Hyun Sil;Sung, David;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.21-41
    • /
    • 2019
  • Thanks to the rapid development of information technologies, the data available on Internet have grown rapidly. In this era of big data, many studies have attempted to offer insights and express the effects of data analysis. In the tourism and hospitality industry, many firms and studies in the era of big data have paid attention to online reviews on social media because of their large influence over customers. As tourism is an information-intensive industry, the effect of these information networks on social media platforms is more remarkable compared to any other types of media. However, there are some limitations to the improvements in service quality that can be made based on opinions on social media platforms. Users on social media platforms represent their opinions as text, images, and so on. Raw data sets from these reviews are unstructured. Moreover, these data sets are too big to extract new information and hidden knowledge by human competences. To use them for business intelligence and analytics applications, proper big data techniques like Natural Language Processing and data mining techniques are needed. This study suggests an analytical approach to directly yield insights from these reviews to improve the service quality of hotels. Our proposed approach consists of topic mining to extract topics contained in the reviews and the decision tree modeling to explain the relationship between topics and ratings. Topic mining refers to a method for finding a group of words from a collection of documents that represents a document. Among several topic mining methods, we adopted the Latent Dirichlet Allocation algorithm, which is considered as the most universal algorithm. However, LDA is not enough to find insights that can improve service quality because it cannot find the relationship between topics and ratings. To overcome this limitation, we also use the Classification and Regression Tree method, which is a kind of decision tree technique. Through the CART method, we can find what topics are related to positive or negative ratings of a hotel and visualize the results. Therefore, this study aims to investigate the representation of an analytical approach for the improvement of hotel service quality from unstructured review data sets. Through experiments for four hotels in Hong Kong, we can find the strengths and weaknesses of services for each hotel and suggest improvements to aid in customer satisfaction. Especially from positive reviews, we find what these hotels should maintain for service quality. For example, compared with the other hotels, a hotel has a good location and room condition which are extracted from positive reviews for it. In contrast, we also find what they should modify in their services from negative reviews. For example, a hotel should improve room condition related to soundproof. These results mean that our approach is useful in finding some insights for the service quality of hotels. That is, from the enormous size of review data, our approach can provide practical suggestions for hotel managers to improve their service quality. In the past, studies for improving service quality relied on surveys or interviews of customers. However, these methods are often costly and time consuming and the results may be biased by biased sampling or untrustworthy answers. The proposed approach directly obtains honest feedback from customers' online reviews and draws some insights through a type of big data analysis. So it will be a more useful tool to overcome the limitations of surveys or interviews. Moreover, our approach easily obtains the service quality information of other hotels or services in the tourism industry because it needs only open online reviews and ratings as input data. Furthermore, the performance of our approach will be better if other structured and unstructured data sources are added.

A Study on Automatic Classification of Newspaper Articles Based on Unsupervised Learning by Departments (비지도학습 기반의 행정부서별 신문기사 자동분류 연구)

  • Kim, Hyun-Jong;Ryu, Seung-Eui;Lee, Chul-Ho;Nam, Kwang Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.9
    • /
    • pp.345-351
    • /
    • 2020
  • Administrative agencies today are paying keen attention to big data analysis to improve their policy responsiveness. Of all the big data, news articles can be used to understand public opinion regarding policy and policy issues. The amount of news output has increased rapidly because of the emergence of new online media outlets, which calls for the use of automated bots or automatic document classification tools. There are, however, limits to the automatic collection of news articles related to specific agencies or departments based on the existing news article categories and keyword search queries. Thus, this paper proposes a method to process articles using classification glossaries that take into account each agency's different work features. To this end, classification glossaries were developed by extracting the work features of different departments using Word2Vec and topic modeling techniques from news articles related to different agencies. As a result, the automatic classification of newspaper articles for each department yielded approximately 71% accuracy. This study is meaningful in making academic and practical contributions because it presents a method of extracting the work features for each department, and it is an unsupervised learning-based automatic classification method for automatically classifying news articles relevant to each agency.

A Study on the Research Trends in Fintech using Topic Modeling (토픽 모델링을 이용한 핀테크 기술 동향 분석)

  • Kim, TaeKyung;Choi, HoeRyeon;Lee, HongChul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.11
    • /
    • pp.670-681
    • /
    • 2016
  • Recently, based on Internet and mobile environments, the Fintech industry that fuses finance and IT together has been rapidly growing and Fintech services armed with simplicity and convenience have been leading the conversion of all financial services into online and mobile services. However, despite the rapid growth of the Fintech industry, few studies have classified Fintech technologies into detailed technologies, analyzed the technology development trends of major market countries, and supported technology planning. In this respect, using Fintech technological data in the form of unstructured data, the present study extracts and defines detailed Fintech technologies through the topic modeling technique. Thereafter, hot and cold topics of the derived detailed Fintech technologies are identified to determine the trend of Fintech technologies. In addition, the trends of technology development in the USA, South Korea, and China, which are major market countries for major Fintech industrial technologies, are analyzed. Finally, through the analyses of networks between detailed Fintech technologies, linkages between the technologies are examined. The trends of Fintech industrial technologies identified in the present study are expected to be effectively utilized for the establishment of policies in the area of the Fintech industry and Fintech related enterprises' establishment of technology strategies.

COVID-19 and Korean Family Life on Social Media: A Topic Model Approach (소셜 빅데이터로 알아본 코로나19와 가족생활: 토픽모델 접근)

  • Park, Sunyoung;Lee, Jaerim
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.3
    • /
    • pp.282-300
    • /
    • 2021
  • The purpose of this study was to explore what social media posts tell us about family life during the COVID-19 pandemic by examining the keywords and topics underlying posts on blogs and online forums. Our criteria for web crawling were (a) blog and forum posts on Naver and Daum, the top portal sites in Korea, (b) posts between February 23 and April 19, 2020, the period of the first heightened social distancing orders, and (c) inclusion of "COVID" and "family" or "COVID" and "home." We analyzed 351,734 posts using TF-IDF values and topic modeling based on latent Dirichlet allocation. We identified and named 22 topics including COVID-19 prevention, family infection, family health, dietary life and changes, religious life, stuck at home, postponed school year, family events, travel and vacations, concerns about family and friends, anxiety and stress, disaster and damage, COVID-19 warning text messages, family support policies, Shin-cheon-ji and Daegu. The results show that COVID-19 impacted various domains of family life including health, food, housing, religion, child care, education, rituals, and leisure as well as relationships and emotions.

Comparing Corporate and Public ESG Perceptions Using Text Mining and ChatGPT Analysis: Based on Sustainability Reports and Social Media (텍스트마이닝과 ChatGPT 분석을 활용한 기업과 대중의 ESG 인식 비교: 지속가능경영보고서와 소셜미디어를 기반으로)

  • Jae-Hoon Choi;Sung-Byung Yang;Sang-Hyeak Yoon
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.347-373
    • /
    • 2023
  • As the significance of ESG (Environmental, Social, and Governance) management amplifies in driving sustainable growth, this study delves into and compares ESG trends and interrelationships from both corporate and societal viewpoints. Employing a combination of Latent Dirichlet Allocation Topic Modeling (LDA) and Semantic Network Analysis, we analyzed sustainability reports alongside corresponding social media datasets. Additionally, an in-depth examination of social media content was conducted using Joint Sentiment Topic Modeling (JST), further enriched by Semantic Network Analysis (SNA). Complementing text mining analysis with the assistance of ChatGPT, this study identified 25 different ESG topics. It highlighted differences between companies aiming to avoid risks and build trust, and the general public's diverse concerns like investment options and working conditions. Key terms like 'greenwashing,' 'serious accidents,' and 'boycotts' show that many people doubt how companies handle ESG issues. The findings from this study set the foundation for a plan that serves key ESG groups, including businesses, government agencies, customers, and investors. This study also provide to guide the creation of more trustworthy and effective ESG strategies, helping to direct the discussion on ESG effectiveness.

Knowledge Modeling and Database Construction for Human Biomonitoring Data (인체 바이오모니터링 지식 모델링 및 데이터베이스 구축)

  • Lee, Jangwoo;Yang, Sehee;Lee, Hunjoo
    • Journal of Food Hygiene and Safety
    • /
    • v.35 no.6
    • /
    • pp.607-617
    • /
    • 2020
  • Human bio-monitoring (HBM) data is a very important resource for tracking total exposure and concentrations of a parent chemical or its metabolites in human biomarkers. However, until now, it was difficult to execute the integration of different types of HBM data due to incompatibility problems caused by gaps in study design, chemical description and coding system between different sources in Korea. In this study, we presented a standardized code system and HBM knowledge model (KM) based on relational database modeling methodology. For this purpose, we used 11 raw datasets collected from the Ministry of Food and Drug Safety (MFDS) between 2006 and 2018. We then constructed the HBM database (DB) using a total of 205,491 concentration-related data points for 18,870 participants and 86 chemicals. In addition, we developed a summary report-type statistical analysis program to verify the inputted HBM datasets. This study will contribute to promoting the sustainable creation and versatile utilization of big-data for HBM results at the MFDS.

Methodology for Identifying Issues of User Reviews from the Perspective of Evaluation Criteria: Focus on a Hotel Information Site (사용자 리뷰의 평가기준 별 이슈 식별 방법론: 호텔 리뷰 사이트를 중심으로)

  • Byun, Sungho;Lee, Donghoon;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.23-43
    • /
    • 2016
  • As a result of the growth of Internet data and the rapid development of Internet technology, "big data" analysis has gained prominence as a major approach for evaluating and mining enormous data for various purposes. Especially, in recent years, people tend to share their experiences related to their leisure activities while also reviewing others' inputs concerning their activities. Therefore, by referring to others' leisure activity-related experiences, they are able to gather information that might guarantee them better leisure activities in the future. This phenomenon has appeared throughout many aspects of leisure activities such as movies, traveling, accommodation, and dining. Apart from blogs and social networking sites, many other websites provide a wealth of information related to leisure activities. Most of these websites provide information of each product in various formats depending on different purposes and perspectives. Generally, most of the websites provide the average ratings and detailed reviews of users who actually used products/services, and these ratings and reviews can actually support the decision of potential customers in purchasing the same products/services. However, the existing websites offering information on leisure activities only provide the rating and review based on one stage of a set of evaluation criteria. Therefore, to identify the main issue for each evaluation criterion as well as the characteristics of specific elements comprising each criterion, users have to read a large number of reviews. In particular, as most of the users search for the characteristics of the detailed elements for one or more specific evaluation criteria based on their priorities, they must spend a great deal of time and effort to obtain the desired information by reading more reviews and understanding the contents of such reviews. Although some websites break down the evaluation criteria and direct the user to input their reviews according to different levels of criteria, there exist excessive amounts of input sections that make the whole process inconvenient for the users. Further, problems may arise if a user does not follow the instructions for the input sections or fill in the wrong input sections. Finally, treating the evaluation criteria breakdown as a realistic alternative is difficult, because identifying all the detailed criteria for each evaluation criterion is a challenging task. For example, if a review about a certain hotel has been written, people tend to only write one-stage reviews for various components such as accessibility, rooms, services, or food. These might be the reviews for most frequently asked questions, such as distance between the nearest subway station or condition of the bathroom, but they still lack detailed information for these questions. In addition, in case a breakdown of the evaluation criteria was provided along with various input sections, the user might only fill in the evaluation criterion for accessibility or fill in the wrong information such as information regarding rooms in the evaluation criteria for accessibility. Thus, the reliability of the segmented review will be greatly reduced. In this study, we propose an approach to overcome the limitations of the existing leisure activity information websites, namely, (1) the reliability of reviews for each evaluation criteria and (2) the difficulty of identifying the detailed contents that make up the evaluation criteria. In our proposed methodology, we first identify the review content and construct the lexicon for each evaluation criterion by using the terms that are frequently used for each criterion. Next, the sentences in the review documents containing the terms in the constructed lexicon are decomposed into review units, which are then reconstructed by using the evaluation criteria. Finally, the issues of the constructed review units by evaluation criteria are derived and the summary results are provided. Apart from the derived issues, the review units are also provided. Therefore, this approach aims to help users save on time and effort, because they will only be reading the relevant information they need for each evaluation criterion rather than go through the entire text of review. Our proposed methodology is based on the topic modeling, which is being actively used in text analysis. The review is decomposed into sentence units rather than considering the whole review as a document unit. After being decomposed into individual review units, the review units are reorganized according to each evaluation criterion and then used in the subsequent analysis. This work largely differs from the existing topic modeling-based studies. In this paper, we collected 423 reviews from hotel information websites and decomposed these reviews into 4,860 review units. We then reorganized the review units according to six different evaluation criteria. By applying these review units in our methodology, the analysis results can be introduced, and the utility of proposed methodology can be demonstrated.

Improving the I/O Performance of Disk-Based Graph Engine by Graph Ordering (디스크 기반 그래프 엔진의 입출력 성능 향상을 위한 그래프 오더링)

  • Lim, Keunhak;Kim, Junghyun;Lee, Eunjae;Seo, Jiwon
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.1
    • /
    • pp.40-45
    • /
    • 2018
  • With the advent of big data and social networks, large-scale graph processing becomes popular research topic. Recently, an optimization technique called Gorder has been proposed to improve the performance of in-memory graph processing. This technique improves performance by optimizing the graph layout on memory to have better cache locality. However, since it is designed for in-memory graph processing systems, the technique is not suitable for disk-based graph engines; also the cost for applying the technique is significantly high. To solve the problem, we propose a new graph ordering called I/O Order. I/O Order considers the characteristics of I/O accesses for SSDs and HDDs to improve the performance of disk-based graph engine. In addition, the algorithmic complexity of I/O Order is simple compared to Gorder, hence it is cheaper to apply I/O Ordering. I/O order reduces the cost of pre-processing up to 9.6 times compared to that of Gorder's, still its performance is 2 times higher compared to the Random in low-locality graph algorithms.