• Title/Summary/Keyword: Topic Data

Search Result 1,572, Processing Time 0.026 seconds

A Development of LDA Topic Association Systems Based on Spark-Hadoop Framework

  • Park, Kiejin;Peng, Limei
    • Journal of Information Processing Systems
    • /
    • v.14 no.1
    • /
    • pp.140-149
    • /
    • 2018
  • Social data such as users' comments are unstructured in nature and up-to-date technologies for analyzing such data are constrained by the available storage space and processing time when fast storing and processing is required. On the other hand, it is even difficult in using a huge amount of dynamically generated social data to analyze the user features in a high speed. To solve this problem, we design and implement a topic association analysis system based on the latent Dirichlet allocation (LDA) model. The LDA does not require the training process and thus can analyze the social users' hourly interests on different topics in an easy way. The proposed system is constructed based on the Spark framework that is located on top of Hadoop cluster. It is advantageous of high-speed processing owing to that minimized access to hard disk is required and all the intermediately generated data are processed in the main memory. In the performance evaluation, it requires about 5 hours to analyze the topics for about 1 TB test social data (SNS comments). Moreover, through analyzing the association among topics, we can track the hourly change of social users' interests on different topics.

Analysis on Topics in Soundscape Research based on Topic Modeling (토픽 모델링을 이용한 사운드스케이프 연구 주제어 분석)

  • Choe, Sou-Hwan
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.7
    • /
    • pp.427-435
    • /
    • 2019
  • Soundscape provides important resources to understand social and cultural aspects of our society, however, it is still its infancy to study on the research framework to record, conserve, categorize, and analyze soundscapes. Topic modeling is an automatic approach to discover hidden themes that are disperse in unstructured documents, thus topic modeling is robust enough to find latent topics such as research trends behind a collection of documents. The purpose of this paper is to discover topics on current soundscape research based on topic modeling, furthermore, to discuss the possibilities to design a metadata system for sound archives and to improve Soundscape Ontology which is currently developing.

Differences and Multi-dimensionality of the Perception of Career Success among Korean Employees: A Topic Modeling Approach (기업근로자 경력성공 인식의 다차원성과 차이: 토픽모델링의 적용)

  • Lee, Jaeeun;Chae, Chungil
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.6
    • /
    • pp.58-71
    • /
    • 2019
  • The purpose of this study is to explore the multi-dimensionality and the differences of the career success that is revealed by the employee's perception. In order to fulfill the research purpose, LDA topic modeling has applied to extract latent topics of career success from 126 Korean employees' open-end survey questionnaires. The extracted latent topics are social recognition, continuing service within an organization, expertise, financial rewards, and pursuing personal meaning. The occurrence probability of each topic was different by individual characteristics such as gender, education, position. Study findings showed there is multi-dimensionality in career success, and there are differences of topic occurrence probability by demographic characteristics. Additionally, this study showed how to apply the recently developed machine learning approach in order to reduce the researcher's bias by adapting the LDA topic modeling to the qualitative open-ended survey data.

Exploring trends in U.N. Peacekeeping Activities in Korea through Topic Modeling and Social Network Analysis (토픽모델링과 사회연결망 분석을 통한 우리나라 유엔 평화유지활동 동향 탐색)

  • Donghyeon Jung;Chansong Kim;Kangmin Lee;Soeun Bae;Yeon Seo;Hyeonju Seol
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.4
    • /
    • pp.246-262
    • /
    • 2023
  • The purpose of this study is to identify the major peacekeeping activities that the Korean armed forces has performed from the past to the present. To do this, we collected 692 press releases from the National Defense Daily over the past 20 years and performed topic modeling and social network analysis. As a result of topic modeling analysis, 112 major keywords and 8 topics were derived, and as a result of examining the Korean armed forces's peacekeeping activities based on the topics, 6 major activities and 2 related matters were identified. The six major activities were 'Northeast Asian defense cooperation', 'multinational force activities', 'civil operations', 'defense diplomacy', 'ceasefire monitoring group', and 'pro-Korean activities', and 'general troop deployment' related to troop deployment in general. Next, social network analysis was performed to examine the relationship between keywords and major keywords related to topic decision, and the keywords 'overseas', 'dispatch', and 'high level' were derived as key words in the network. This study is meaningful in that it first examined the topic of the Korean armed forces's peacekeeping activities over the past 20 years by applying big data techniques based on the National Defense Daily, an unstructured document. In addition, it is expected that the derived topics can be used as a basis for exploring the direction of development of Korea's peacekeeping activities in the future.

Comparative Analysis of the Keywords in Taekwondo News Articles by Year: Applying Topic Modeling Method (태권도 뉴스기사의 연도별 주제어 비교분석: 토픽모델링 적용)

  • Jeon, Minsoo;Lim, Hyosung
    • Journal of Digital Convergence
    • /
    • v.19 no.11
    • /
    • pp.575-583
    • /
    • 2021
  • This study aims to analyze Taekwondo trends according to news articles by year by applying topic modeling. In order to examine the Taekwondo trend through media reports, articles including news articles and Taekwondo specialized media articles were collected through Big Kinds of the Korea Press Foundation. The search period was divided into three sections: before 2000, 2001~2010, and 2011~2020. A total of 12,124 items were selected as research data. For topic analysis, pre-processing was performed, and topic analysis was performed using the LDA algorithm. In this case, python 3 was applied for all analysis. First, as a result of analyzing the topics of media articles by year, 'World' was the most common keyword before 2000. 'South and North Korea' was next common and 'Olympic' was the third commonest topic. From 2001 to 2010, 'World' was the most common topic, followed by 'Association' and 'World Taekwondo'. From 2011 to 2020, 'World', 'Demonstration', and 'Kukkiwon' was the most common topic in that order. Second, as a result of analyzing news articles before 2000 by topic modeling, topics were divided into two categories. Specifically, Topic 1 was selected as 'South-North Korea sports exchange' and Topic 2 was selected as 'Adoption of Olympic demonstration events'. Third, as a result of analyzing news articles from 2001 to 2010 by topic modeling, three topics were selected. Topic 1 was selected as 'Taekwondo Demonstration Performance and Corruption', Topic 2 was selected as 'Muju Taekwondo Park Creation', and Topic 3 was selected as 'World Taekwondo Festival'. Fourth, as a result of analyzing news articles from 2011 to 2020 by topic modeling, three topics were selected. Topic 1 was selected as 'Successful Hosting of the 2018 Pyeongchang Winter Olympics', Topic 2 was selected as 'North-South Korea Taekwondo Joint Demonstration Performance', and Topic 3 was selected as '2017 Muju World Taekwondo Championships'.

Semantic Dependency Link Topic Model for Biomedical Acronym Disambiguation (의미적 의존 링크 토픽 모델을 이용한 생물학 약어 중의성 해소)

  • Kim, Seonho;Yoon, Juntae;Seo, Jungyun
    • Journal of KIISE
    • /
    • v.41 no.9
    • /
    • pp.652-665
    • /
    • 2014
  • Many important terminologies in biomedical text are expressed as abbreviations or acronyms. We newly suggest a semantic link topic model based on the concepts of topic and dependency link to disambiguate biomedical abbreviations and cluster long form variants of abbreviations which refer to the same senses. This model is a generative model inspired by the latent Dirichlet allocation (LDA) topic model, in which each document is viewed as a mixture of topics, with each topic characterized by a distribution over words. Thus, words of a document are generated from a hidden topic structure of a document and the topic structure is inferred from observable word sequences of document collections. In this study, we allow two distinct word generation to incorporate semantic dependencies between words, particularly between expansions (long forms) of abbreviations and their sentential co-occurring words. Besides topic information, the semantic dependency between words is defined as a link and a new random parameter for the link presence is assigned to each word. As a result, the most probable expansions with respect to abbreviations of a given abstract are decided by word-topic distribution, document-topic distribution, and word-link distribution estimated from document collection though the semantic dependency link topic model. The abstracts retrieved from the MEDLINE Entrez interface by the query relating 22 abbreviations and their 186 expansions were used as a data set. The link topic model correctly predicted expansions of abbreviations with the accuracy of 98.30%.

The Study of Prosodic Features in Korean Topic Constructions (한국어 화제구문의 운율적 고찰)

  • Hwang, Son-Moon
    • Speech Sciences
    • /
    • v.9 no.2
    • /
    • pp.59-68
    • /
    • 2002
  • This paper analyzes the prosodic features distinctively associated with Korean topic constructions (marked by nun or its variant un) and subject constructions (marked by ka or its variant i) as a way of explicating the role that prosody plays in differentially constituting their discourse messages. Using both spoken data elicited in controlled settings and spontaneous conversational data, an attempt is made to identify differentiating prosodic features and intonation contours associated with distinct meanings and functions of nun- and ka-constructions evoked in a variety of discourse contexts.

  • PDF

Topics and Sentiment Analysis Based on Reviews of Omni-Channel Retailing

  • KIM, Soon-Hong;YOO, Byong-Kook
    • Journal of Distribution Science
    • /
    • v.19 no.4
    • /
    • pp.25-35
    • /
    • 2021
  • Purpose: This study aims to analyze the factors affecting customer satisfaction in the customer reviews of omni-channel, posted on Internet blogs, cafes, and YouTube using text mining analysis. Research, data, and Methodology: In this study, frequency analysis is performed and the LDA (Latent Dirichlet Allocation) is used to analyze social big data to respond to reviewers' reaction to the recently opened omni-channel shopping reviews by L Shopping Company. Additionally, based on the topic analysis, we conduct a sentiment analysis on purchase reviews and analyze the characteristics of each topic on the positive or negative sentiments of omni-channel app users. Results: As a result of a topic analysis, four main topics are derived: delivery and events, economic value, recommendations and convenience, and product quality and brand awareness. The emotional analysis reveals that the reviewers have many positive evaluations for price policy and product promotion, but negative evaluations for app use, delivery, and product quality. Conclusions: Retailers can establish customized marketing strategies by identifying the customer's major interests through text mining analysis. Additionally, the analysis of sentiment by subject becomes an important indicator for developing products and services that customers want by identifying areas that satisfy customers and areas that evoke negative reactions.

Analysis of Shipping and Logistics News Articles using Topic Modeling (토픽모델링을 활용한 해운물류 뉴스 분석)

  • Hee-Young Yoon;Il-Youp Kwak
    • Korea Trade Review
    • /
    • v.46 no.4
    • /
    • pp.61-76
    • /
    • 2021
  • This study focuses on three logistics-related news (Logistics Newspaper, Korea Shipping Gadget, and Korea Shipping Newspaper) in order to present changes in logistics issues, centering on Corona 19, which has recently had the greatest impact in the world. For data collection, two-year news articles in 2019 and 2020 (title, article, content, date, article classification, article URL) were collected through web crawling (using Python's BeautifulSoup, requests module) on the homepages of three representative logistics-related media companies. As for the data analysis methods, fundamental statistical analysis, Latent Dirichlet Allocation (LDA) for topic modeling, and Scattertext were performed. The analysis results were as follows. First, among the three news media related to logistics, the Korea Shipping Newspaper was carrying out the most active media activities. Second, through topic modeling with LDA, eight logistics-related topics were identified, and keywords and significant issues of each topic were presented. Third, the keywords were visually expressed through Scattertext. This is the first study to present changes in the logistics field, focusing on articles from representative logistics-related media in 2019 and 2020. In particular, 2019 and 2020 can be divided into before and after the outbreak of Corona 19, which has had a great impact not only on the logistics field but also on our lives as a whole. For future work, a multi-faceted approach is required, such as comparative studies of logistics issues between countries or presenting implications based on long-term time-series articles.

Corpus-based analysis of the usage of Korean markers -(n)un and -i/ka in editorial texts

  • Kim, Kyoung-Young
    • Language and Information
    • /
    • v.19 no.2
    • /
    • pp.19-36
    • /
    • 2015
  • The aim of this paper is to investigate the usage of Korean markers -(n)un and -i/ka in editorial texts focusing on information structure. Noun phrases ending with the markers -(n)un and -i/ka were annotated semi-automatically using a corpus obtained from an online newspaper. Two important factors to determine the choice of markers were examined with the annotated data: referential givenness/newness and position in a sentence. Referential givenness and newness were adopted as indicators of information structure, topic and focus respectively. In addition to quantitative analysis, qualitative analysis was conducted on the selected data. The results suggest that both the marker -(n)un and -i/ka could carry a topic and a focus reading. Sentence position also played a crucial role in determining the marker, and the marker -i/ka was used more frequently in a later position of a sentence than the marker -(n)un.

  • PDF