• Title/Summary/Keyword: Topic distribution

Search Result 301, Processing Time 0.026 seconds

Building a Korean-English Parallel Corpus by Measuring Sentence Similarities Using Sequential Matching of Language Resources and Topic Modeling (언어 자원과 토픽 모델의 순차 매칭을 이용한 유사 문장 계산 기반의 위키피디아 한국어-영어 병렬 말뭉치 구축)

  • Cheon, JuRyong;Ko, YoungJoong
    • Journal of KIISE
    • /
    • v.42 no.7
    • /
    • pp.901-909
    • /
    • 2015
  • In this paper, to build a parallel corpus between Korean and English in Wikipedia. We proposed a method to find similar sentences based on language resources and topic modeling. We first applied language resources(Wiki-dictionary, numbers, and online dictionary in Daum) to match word sequentially. We construct the Wiki-dictionary using titles in Wikipedia. In order to take advantages of the Wikipedia, we used translation probability in the Wiki-dictionary for word matching. In addition, we improved the accuracy of sentence similarity measuring method by using word distribution based on topic modeling. In the experiment, a previous study showed 48.4% of F1-score with only language resources based on linear combination and 51.6% with the topic modeling considering entire word distributions additionally. However, our proposed methods with sequential matching added translation probability to language resources and achieved 9.9% (58.3%) better result than the previous study. When using the proposed sequential matching method of language resources and topic modeling after considering important word distributions, the proposed system achieved 7.5%(59.1%) better than the previous study.

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.

A Study on Identifying Topics and Trends in International Cadastral Research Using LDA: With Special Reference to the FIG Peer Review Journal (LDA를 이용한 국제지적연구의 주제와 추세확인에 관한 연구: 특히 FIG Peer Review Journal을 중심으로)

  • kim, Yun-Ki
    • Journal of Cadastre & Land InformatiX
    • /
    • v.48 no.1
    • /
    • pp.15-33
    • /
    • 2018
  • The main purpose of this study was to identify the topics and research trends of international cadastral research using LDA. To achieve this goal, I reviewed the literature on LDA and international cadastral study and formulated four research questions that are topics of cadastral researchers, distribution of topics, the most influential topics and changes of topics over time. To answer these research questions, I analyzed 370 papers published in the FIG Peer Review Journal between January 1, 2008, and October 31, 2017, using LDA. As a result of the analysis, I confirmed that there are twelve major topics in international cadastral research. And the most influential topic of these topics was identified as topic 2(cadastral information systems), and topic 5(land development and land administration) was also confirmed as playing an important role in the overall document. These two topics have been the most popular topics whose trendlines have been very active over the past decade and will play a leading role in future cadastral research.

Design to Realtime Test Data Topic Utilize of Data Distribution Service (데이터 분산 서비스를 활용한 실시간 시험자료 토픽 설계)

  • Choi, Won-gyu
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.7
    • /
    • pp.1447-1454
    • /
    • 2017
  • The realtime test data topic means that process for the data efficiently from many kinds of measurement device at the test range. There are many measurement devices in test range. The test range require accurate observation and determine on test object. In this realtime test data slaving framework system, the system can produce variety of test informations and all these data also must be transmitted to test information management or display system in realtime. Using RTI DDS(Data Distribution Service) middle ware Ver 5.2, we can product the efficiency of system usability and QoS(Quality of Service) requirements. So the application user enables to concentrate on applications, not middle ware. As the reason, Complex function is provided by the DDS, not the application such as Visualization Software. In this paper, I suggest the realtime test data topic on slaving framework of realtime test data based on DDS at the test range system.

Spatial Distribution Patterns of Twitter Data with Topic Modeling (토픽 모델링을 이용한 트위터 데이터의 공간 분포 패턴 분석)

  • Woo, Hyun Jee;Kim, Young Hoon
    • Journal of the Korean association of regional geographers
    • /
    • v.23 no.2
    • /
    • pp.376-387
    • /
    • 2017
  • This paper attempts to analyze the geographical characters of Twitter data and presents analysis potentials for social network analysis in geography. First, this paper suggests a methodology for a topic modeling-based approach in order to identify the geographical characteristics of tweets, including an analysis flow of Twitter data sets, tweet data collection and conversion, textural pre-processing and structural analysis, topic discovery, and interpretation of tweets' topics. GPS coordinates referencing tweets(geotweets) were extracted among sampled Twitter data sets because it contains the tweet place where it was created. This paper identifies a correlated relationship between some specific topics and local places in Jeju. This correlation is closely associated with some place names and local sites in Jeju Island. We assume it is the intention of tweeters to record their tweet places and to share and retweet with other tweeters in some cases. A surface density map shows the hotspots of tweets, detecting around some specific places and sites such as Jeju airport, sightseeing sites, and local places in Jeju Island. The hotspots show similar patterns of the floating population of Jeju, especially the thirty-year age group. In addition, a topic modeling algorithm is applied for the geographical topic discovery and comparison of the spatial patterns of tweets. Finally, this empirical analysis presents that Twitter data, as social network data, provide geographical significance, with topic modeling approach being useful in analyzing the textural features reflecting the geographical characteristics in large data sets of tweets.

  • PDF

The Impact of Environmental Social Governance Management for Improving Gas Firm Performance

  • Seung-Chul LEE
    • The Journal of Industrial Distribution & Business
    • /
    • v.14 no.4
    • /
    • pp.23-31
    • /
    • 2023
  • Purpose: Gas firms often fall victim to disregarding the importance of sensitivity, thus leading to many unprecedented repercussions. To ensure that gas firms fully contribute to sustainability and ethical standards, environmental Social Governance (ESG) has been identified as the ideal framework. This study aims to investigate the impact of ESG management for improving gas firm performance. Research design, data and methodology: The prior qualitative literature analysis was to figure out adequate past research for the topic based on the major portal web databased, such as 'Google Scholar' and 'Scopus' to make sure resources' credibility. Results: Gas firms are among the pertinent organizations vis-à-vis environmental destruction issues. Gas firms emit dangerous gases such as ethane, carbon dioxide and methane that are dangerous for the people and the environment. Thus, many pro-environmental conservation stakeholders have had rallying calls for such gas firms to mitigate environmental pollution intentionally. Conclusions: This study may be used to human resources in improving employee results elsewhere. Besides, it can be of the essence in improving the relationship between such firms and society. Therefore, the study findings are of greater significance and implications to multiple parties, users and stakeholders regarding the research topic and beyond the current scope of the study.

The Role of the Lifelong Learning for Improving HRM Policy in a Company

  • OH, Su-Hyang
    • The Journal of Industrial Distribution & Business
    • /
    • v.14 no.1
    • /
    • pp.57-65
    • /
    • 2023
  • Purpose: The purpose of this research paper, therefore, is to explore the role of lifelong learning in improving HRM policies in a company. This research begins with a literature review of existing research on the topic, followed by a discussion of the findings and their implications for practitioners. Research design, data and methodology: The present author of this research collected textual dataset based on the numerous literature which has been investigated thoroughly in terms of the HRM policy and lifelong learning. For this reason, the author could obtain adequate prior studies, checking their validity and reliability. Results: The present research figured out that demonstrating that physical activity and exercise can enhance life expectancy, improve physical and mental health, and improve functional ability, and Examining the broad topic of socialization and interaction's function in raising elderly adults' living standards is necessary. Also, this research found that the social change and social isolation of older individuals in relation to the impact of digital technology. Conclusions: This research suggests that companies should also ensure that their HRM policies are designed in such a way that they allow employees to pursue further learning and development opportunities without having to sacrifice their current job responsibilities.

Research Trend Analysis of the Retail Industry: Focusing on the Department Store (유통업태 연구동향 분석: 백화점을 중심으로)

  • Hoe-Chang YANG
    • The Journal of Economics, Marketing and Management
    • /
    • v.11 no.5
    • /
    • pp.45-55
    • /
    • 2023
  • Purpose: As one of the continuous studies on the offline distribution industry, the purpose of this study is to find ways for offline stores to respond to the growth of online shopping by identifying research trends on department stores. Research design, data and methodology: To this end, this study conducted word frequency analysis, word co-occurrence frequency analysis, BERTopic, LDA, and dynamic topic modeling using Python 3.7 on a total of 551 English abstracts searched with the keyword 'department store' in scienceON as of October 10, 2022. Results: The results of word frequency analysis and co-occurrence frequency analysis revealed that research related to department stores frequently focuses on factors such as customers, consumers, products, satisfaction, services, and quality. BERTopic and LDA analyses identified five topics, including 'store image,' with 'shopping information' showing relatively high interest, while 'sales systems' were observed to have relatively lower interest. Conclusions: Based on the results of this study, it was concluded that research related to department stores has so far been conducted in a limited scope, and it is insufficient to provide clues for department stores to secure competitiveness against online platforms. Therefore, it is suggested that additional research be conducted on topics such as the true role of department stores in the retail industry, consumer reinterpretation, customer value and lifetime value, department stores as future retail spaces, ethical management, and transparent ESG management.

Distribution of a Sum of Weighted Noncentral Chi-Square Variables

  • Heo, Sun-Yeong;Chang, Duk-Joon
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.2
    • /
    • pp.429-440
    • /
    • 2006
  • In statistical computing, it is often for researchers to need the distribution of a weighted sum of noncentral chi-square variables. In this case, it is very limited to know its exact distribution. There are many works to contribute to this topic, e.g. Imhof (1961) and Solomon-Stephens (1977). Imhof's method gives good approximation to the true distribution, but it is not easy to apply even though we consider the development of computer technology Solomon-Stephens's three moment chi-square approximation is relatively easy and accurate to apply. However, they skipped many details, and their simulation is limited to a weighed sum of central chi-square random variables. This paper gives details on Solomon-Stephens's method. We also extend their simulation to the weighted sum of non-central chi-square distribution. We evaluated approximated powers for homogeneous test and compared them with the true powers. Solomon-Stephens's method shows very good approximation for the case.

Research Trends Analysis of Big Data: Focused on the Topic Modeling (빅데이터 연구동향 분석: 토픽 모델링을 중심으로)

  • Park, Jongsoon;Kim, Changsik
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.15 no.1
    • /
    • pp.1-7
    • /
    • 2019
  • The objective of this study is to examine the trends in big data. Research abstracts were extracted from 4,019 articles, published between 1995 and 2018, on Web of Science and were analyzed using topic modeling and time series analysis. The 20 single-term topics that appeared most frequently were as follows: model, technology, algorithm, problem, performance, network, framework, analytics, management, process, value, user, knowledge, dataset, resource, service, cloud, storage, business, and health. The 20 multi-term topics were as follows: sense technology architecture (T10), decision system (T18), classification algorithm (T03), data analytics (T17), system performance (T09), data science (T06), distribution method (T20), service dataset (T19), network communication (T05), customer & business (T16), cloud computing (T02), health care (T14), smart city (T11), patient & disease (T04), privacy & security (T08), research design (T01), social media (T12), student & education (T13), energy consumption (T07), supply chain management (T15). The time series data indicated that the 40 single-term topics and multi-term topics were hot topics. This study provides suggestions for future research.