• Title/Summary/Keyword: Aspect-Mining

Search Result 69, Processing Time 0.022 seconds

Topic Masks for Image Segmentation

  • Jeong, Young-Seob;Lim, Chae-Gyun;Jeong, Byeong-Soo;Choi, Ho-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.12
    • /
    • pp.3274-3292
    • /
    • 2013
  • Unsupervised methods for image segmentation are recently drawing attention because most images do not have labels or tags. A topic model is such an unsupervised probabilistic method that captures latent aspects of data, where each latent aspect, or a topic, is associated with one homogeneous region. The results of topic models, however, usually have noises, which decreases the overall segmentation performance. In this paper, to improve the performance of image segmentation using topic models, we propose two topic masks applicable to topic assignments of homogeneous regions obtained from topic models. The topic masks capture the noises among the assigned topic assignments or topic labels, and remove the noises by replacements, just like image masks for pixels. However, as the nature of topic assignments is different from image pixels, the topic masks have properties that are different from the existing image masks for pixels. There are two contributions of this paper. First, the topic masks can be used to reduce the noises of topic assignments obtained from topic models for image segmentation tasks. Second, we test the effectiveness of the topic masks by applying them to segmented images obtained from the Latent Dirichlet Allocation model and the Spatial Latent Dirichlet Allocation model upon the MSRC image dataset. The empirical results show that one of the masks successfully reduces the topic noises.

Research Trend Analysis for Sustainable QR code use - Focus on Big Data Analysis

  • Lee, Eunji;Jang, Jikyung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.9
    • /
    • pp.3221-3242
    • /
    • 2021
  • The purpose of the study is to examine the current study trend of 'QR code' and suggest a direction for the future study of big data analysis: (1) Background: study trend of 'QR code' and analysis of the text by subject field and year; (2) Methodology: data scraping and collection, EXCEL summary, and preprocess and big data analysis by R x 64 4.0.2 program package; (3) the findings: first, the trend showed a continuous increase in 'QR code' studies in general and the findings were applied in various fields. Second, the analysis of frequent keywords showed somewhat different results by subject field and year, but the overall results were similar. Third, the visualization of the frequent keywords also showed similar results as that of frequent keyword analysis; and (4) the conclusions: in general, 'QR code' studies are used in various fields, and the trend is likely to increase in the future as well. And the findings of this study are a reflection that 'QR code' is an aspect of our social and cultural phenomena, so that it is necessary to think that 'QR code' is a tool and an application of information. An expansion of the scope of the analysis is expected to show us more meaningful indications on 'QR code' study trends and development potential.

Sentiment Dictionary Construction Based on Reason-Sentiment Pattern Using Korean Syntax Analysis (한국어 구문분석을 활용한 이유-감성 패턴 기반의 감성사전 구축)

  • Woo Hyun Kim;Heejung Lee
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.4
    • /
    • pp.142-151
    • /
    • 2023
  • Sentiment analysis is a method used to comprehend feelings, opinions, and attitudes in text, and it is essential for evaluating consumer feedback and social media posts. However, creating sentiment dictionaries, which are necessary for this analysis, is complex and time-consuming because people express their emotions differently depending on the context and domain. In this study, we propose a new method for simplifying this procedure. We utilize syntax analysis of the Korean language to identify and extract sentiment words based on the Reason-Sentiment Pattern, which distinguishes between words expressing feelings and words explaining why those feelings are expressed, making it applicable in various contexts and domains. We also define sentiment words as those with clear polarity, even when used independently and exclude words whose polarity varies with context and domain. This approach enables the extraction of explicit sentiment expressions, enhancing the accuracy of sentiment analysis at the attribute level. Our methodology, validated using Korean cosmetics review datasets from Korean online shopping malls, demonstrates how a sentiment dictionary focused solely on clear polarity words can provide valuable insights for product planners. Understanding the polarity and reasons behind specific attributes enables improvement of product weaknesses and emphasis on strengths. This approach not only reduces dependency on extensive sentiment dictionaries but also offers high accuracy and applicability across various domains.

Comparison of Association Rule Learning and Subgroup Discovery for Mining Traffic Accident Data (교통사고 데이터의 마이닝을 위한 연관규칙 학습기법과 서브그룹 발견기법의 비교)

  • Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.1-16
    • /
    • 2015
  • Traffic accident is one of the major cause of death worldwide for the last several decades. According to the statistics of world health organization, approximately 1.24 million deaths occurred on the world's roads in 2010. In order to reduce future traffic accident, multipronged approaches have been adopted including traffic regulations, injury-reducing technologies, driving training program and so on. Records on traffic accidents are generated and maintained for this purpose. To make these records meaningful and effective, it is necessary to analyze relationship between traffic accident and related factors including vehicle design, road design, weather, driver behavior etc. Insight derived from these analysis can be used for accident prevention approaches. Traffic accident data mining is an activity to find useful knowledges about such relationship that is not well-known and user may interested in it. Many studies about mining accident data have been reported over the past two decades. Most of studies mainly focused on predict risk of accident using accident related factors. Supervised learning methods like decision tree, logistic regression, k-nearest neighbor, neural network are used for these prediction. However, derived prediction model from these algorithms are too complex to understand for human itself because the main purpose of these algorithms are prediction, not explanation of the data. Some of studies use unsupervised clustering algorithm to dividing the data into several groups, but derived group itself is still not easy to understand for human, so it is necessary to do some additional analytic works. Rule based learning methods are adequate when we want to derive comprehensive form of knowledge about the target domain. It derives a set of if-then rules that represent relationship between the target feature with other features. Rules are fairly easy for human to understand its meaning therefore it can help provide insight and comprehensible results for human. Association rule learning methods and subgroup discovery methods are representing rule based learning methods for descriptive task. These two algorithms have been used in a wide range of area from transaction analysis, accident data analysis, detection of statistically significant patient risk groups, discovering key person in social communities and so on. We use both the association rule learning method and the subgroup discovery method to discover useful patterns from a traffic accident dataset consisting of many features including profile of driver, location of accident, types of accident, information of vehicle, violation of regulation and so on. The association rule learning method, which is one of the unsupervised learning methods, searches for frequent item sets from the data and translates them into rules. In contrast, the subgroup discovery method is a kind of supervised learning method that discovers rules of user specified concepts satisfying certain degree of generality and unusualness. Depending on what aspect of the data we are focusing our attention to, we may combine different multiple relevant features of interest to make a synthetic target feature, and give it to the rule learning algorithms. After a set of rules is derived, some postprocessing steps are taken to make the ruleset more compact and easier to understand by removing some uninteresting or redundant rules. We conducted a set of experiments of mining our traffic accident data in both unsupervised mode and supervised mode for comparison of these rule based learning algorithms. Experiments with the traffic accident data reveals that the association rule learning, in its pure unsupervised mode, can discover some hidden relationship among the features. Under supervised learning setting with combinatorial target feature, however, the subgroup discovery method finds good rules much more easily than the association rule learning method that requires a lot of efforts to tune the parameters.

Text Mining-Based Emerging Trend Analysis for e-Learning Contents Targeting for CEO (텍스트마이닝을 통한 최고경영자 대상 이러닝 콘텐츠 트렌드 분석)

  • Kyung-Hoon Kim;Myungsin Chae;Byungtae Lee
    • Information Systems Review
    • /
    • v.19 no.2
    • /
    • pp.1-19
    • /
    • 2017
  • Original scripts of e-learning lectures for the CEOs of corporation S were analyzed using topic analysis, which is a text mining method. Twenty-two topics were extracted based on the keywords chosen from five-year records that ranged from 2011 to 2015. Research analysis was then conducted on various issues. Promising topics were selected through evaluation and element analysis of the members of each topic. In management and economics, members demonstrated high satisfaction and interest toward topics in marketing strategy, human resource management, and communication. Philosophy, history of war, and history demonstrated high interest and satisfaction in the field of humanities, whereas mind health showed high interest and satisfaction in the field of in lifestyle. Studies were also conducted to identify topics on the proportion of content, but these studies failed to increase member satisfaction. In the field of IT, educational content responds sensitively to change of the times, but it may not increase the interest and satisfaction of members. The present study found that content production for CEOs should draw out deep implications for value innovation through technology application instead of simply ending the technical aspect of information delivery. Previous studies classified contents superficially based on the name of content program when analyzing the status of content operation. However, text mining can derive deep content and subject classification based on the contents of unstructured data script. This approach can examine current shortages and necessary fields if the service contents of the themes are displayed by year. This study was based on data obtained from influential e-learning companies in Korea. Obtaining practical results was difficult because data were not acquired from portal sites or social networking service. The content of e-learning trends of CEOs were analyzed. Data analysis was also conducted on the intellectual interests of CEOs in each field.

A Comparative Study on the Social Awareness of Metaverse in Korea and China: Using Big Data Analysis (한국과 중국의 메타버스에 관한 사회적 인식의 비교연구: 빅데이터 분석의 활용 )

  • Ki-youn Kim
    • Journal of Internet Computing and Services
    • /
    • v.24 no.1
    • /
    • pp.71-86
    • /
    • 2023
  • The purpose of this exploratory study is to compare the differences in public perceptual characteristics of Korean and Chinese societies regarding the metaverse using big data analysis. Due to the environmental impact of the COVID-19 pandemic, technological progress, and the expansion of new consumer bases such as generation Z and Alpha, the world's interest in the metaverse is drawing attention, and related academic studies have been also in full swing from 2021. In particular, Korea and China have emerged as major leading countries in the metaverse industry. It is a timely research question to discover the difference in social awareness using big data accumulated in both countries at a time when the amount of mentions on the metaverse has skyrocketed. The analysis technique identifies the importance of key words by analyzing word frequency, N-gram, and TF-IDF of clean data through text mining analysis, and analyzes the density and centrality of semantic networks to determine the strength of connection between words and their semantic relevance. Python 3.9 Anaconda data science platform 3 and Textom 6 versions were used, and UCINET 6.759 analysis and visualization were performed for semantic network analysis and structural CONCOR analysis. As a result, four blocks, each of which are similar word groups, were driven. These blocks represent different perspectives that reflect the types of social perceptions of the metaverse in both countries. Studies on the metaverse are increasing, but studies on comparative research approaches between countries from a cross-cultural aspect have not yet been conducted. At this point, as a preceding study, this study will be able to provide theoretical grounds and meaningful insights to future studies.

Sentiment Analysis for COVID-19 Vaccine Popularity

  • Muhammad Saeed;Naeem Ahmed;Abid Mehmood;Muhammad Aftab;Rashid Amin;Shahid Kamal
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.5
    • /
    • pp.1377-1393
    • /
    • 2023
  • Social media is used for various purposes including entertainment, communication, information search, and voicing their thoughts and concerns about a service, product, or issue. The social media data can be used for information mining and getting insights from it. The World Health Organization has listed COVID-19 as a global epidemic since 2020. People from every aspect of life as well as the entire health system have been severely impacted by this pandemic. Even now, after almost three years of the pandemic declaration, the fear caused by the COVID-19 virus leading to higher depression, stress, and anxiety levels has not been fully overcome. This has also triggered numerous kinds of discussions covering various aspects of the pandemic on the social media platforms. Among these aspects is the part focused on vaccines developed by different countries, their features and the advantages and disadvantages associated with each vaccine. Social media users often share their thoughts about vaccinations and vaccines. This data can be used to determine the popularity levels of vaccines, which can provide the producers with some insight for future decision making about their product. In this article, we used Twitter data for the vaccine popularity detection. We gathered data by scraping tweets about various vaccines from different countries. After that, various machine learning and deep learning models, i.e., naive bayes, decision tree, support vector machines, k-nearest neighbor, and deep neural network are used for sentiment analysis to determine the popularity of each vaccine. The results of experiments show that the proposed deep neural network model outperforms the other models by achieving 97.87% accuracy.

Text Network Analysis on Stalking-Related News Articles (스토킹 관련 언론기사에 대한 텍스트네트워크분석)

  • Eun-Sun Ji;Sang-Hee Jeong
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.579-585
    • /
    • 2023
  • The purpose of this study is to explore keywords within stalking-related news articles according to political orientation through the text network analysis, and then to examine the implicit intentions. Selecting total 1,607 articles including 824 articles of the conservative press(The Chosun Ilbo, The Joongang Ilbo) and 783 articles of the progressive press(The Hankyoreh, The Kyunghyang Shinmun) reported from January 1, 2018 to December 31, 2022, this study explored the aspect of topic category drawn through the topic modeling technique based on LDA(Latent Dirichlet Allocation). In the results of this study, the common topics of the conservative and progressive press were improvement of the perception of gender-based violence, personal protection & intensity of punishment, and disclosure of stalkers' personal information. Regarding the topics differently shown in those two press, the conservative press showed stalkers' harmful act, and outline of 'murder case at Sindang Station' while the progressive press showed request for aggravated punishment on the 'murder case at Sindang Station', and eradication of sexual exploitation crime (in cyber space). The results of this study imply that there are changes in the type of reporting according to ideological opinions about stalking in news articles.

Wine Quality Prediction by Using Backward Elimination Based on XGBoosting Algorithm

  • Umer Zukaib;Mir Hassan;Tariq Khan;Shoaib Ali
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.31-42
    • /
    • 2024
  • Different industries mostly rely on quality certification for promoting their products or brands. Although getting quality certification, specifically by human experts is a tough job to do. But the field of machine learning play a vital role in every aspect of life, if we talk about quality certification, machine learning is having a lot of applications concerning, assigning and assessing quality certifications to different products on a macro level. Like other brands, wine is also having different brands. In order to ensure the quality of wine, machine learning plays an important role. In this research, we use two datasets that are publicly available on the "UC Irvine machine learning repository", for predicting the wine quality. Datasets that we have opted for our experimental research study were comprised of white wine and red wine datasets, there are 1599 records for red wine and 4898 records for white wine datasets. The research study was twofold. First, we have used a technique called backward elimination in order to find out the dependency of the dependent variable on the independent variable and predict the dependent variable, the technique is useful for predicting which independent variable has maximum probability for improving the wine quality. Second, we used a robust machine learning algorithm known as "XGBoost" for efficient prediction of wine quality. We evaluate our model on the basis of error measures, root mean square error, mean absolute error, R2 error and mean square error. We have compared the results generated by "XGBoost" with the other state-of-the-art machine learning techniques, experimental results have showed, "XGBoost" outperform as compared to other state of the art machine learning techniques.

Derivation of Green Infrastructure Planning Factors for Reducing Particulate Matter - Using Text Mining - (미세먼지 저감을 위한 그린인프라 계획요소 도출 - 텍스트 마이닝을 활용하여 -)

  • Seok, Youngsun;Song, Kihwan;Han, Hyojoo;Lee, Junga
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.49 no.5
    • /
    • pp.79-96
    • /
    • 2021
  • Green infrastructure planning represents landscape planning measures to reduce particulate matter. This study aimed to derive factors that may be used in planning green infrastructure for particulate matter reduction using text mining techniques. A range of analyses were carried out by focusing on keywords such as 'particulate matter reduction plan' and 'green infrastructure planning elements'. The analyses included Term Frequency-Inverse Document Frequency (TF-IDF) analysis, centrality analysis, related word analysis, and topic modeling analysis. These analyses were carried out via text mining by collecting information on previous related research, policy reports, and laws. Initially, TF-IDF analysis results were used to classify major keywords relating to particulate matter and green infrastructure into three groups: (1) environmental issues (e.g., particulate matter, environment, carbon, and atmosphere), target spaces (e.g., urban, park, and local green space), and application methods (e.g., analysis, planning, evaluation, development, ecological aspect, policy management, technology, and resilience). Second, the centrality analysis results were found to be similar to those of TF-IDF; it was confirmed that the central connectors to the major keywords were 'Green New Deal' and 'Vacant land'. The results from the analysis of related words verified that planning green infrastructure for particulate matter reduction required planning forests and ventilation corridors. Additionally, moisture must be considered for microclimate control. It was also confirmed that utilizing vacant space, establishing mixed forests, introducing particulate matter reduction technology, and understanding the system may be important for the effective planning of green infrastructure. Topic analysis was used to classify the planning elements of green infrastructure based on ecological, technological, and social functions. The planning elements of ecological function were classified into morphological (e.g., urban forest, green space, wall greening) and functional aspects (e.g., climate control, carbon storage and absorption, provision of habitats, and biodiversity for wildlife). The planning elements of technical function were classified into various themes, including the disaster prevention functions of green infrastructure, buffer effects, stormwater management, water purification, and energy reduction. The planning elements of the social function were classified into themes such as community function, improving the health of users, and scenery improvement. These results suggest that green infrastructure planning for particulate matter reduction requires approaches related to key concepts, such as resilience and sustainability. In particular, there is a need to apply green infrastructure planning elements in order to reduce exposure to particulate matter.