• Title/Summary/Keyword: Statistics topic

Search Result 133, Processing Time 0.02 seconds

Feature selection for text data via topic modeling (토픽 모형을 이용한 텍스트 데이터의 단어 선택)

  • Woosol, Jang;Ye Eun, Kim;Won, Son
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.6
    • /
    • pp.739-754
    • /
    • 2022
  • Usually, text data consists of many variables, and some of them are closely correlated. Such multi-collinearity often results in inefficient or inaccurate statistical analysis. For supervised learning, one can select features by examining the relationship between target variables and explanatory variables. On the other hand, for unsupervised learning, since target variables are absent, one cannot use such a feature selection procedure as in supervised learning. In this study, we propose a word selection procedure that employs topic models to find latent topics. We substitute topics for the target variables and select terms which show high relevance for each topic. Applying the procedure to real data, we found that the proposed word selection procedure can give clear topic interpretation by removing high-frequency words prevalent in various topics. In addition, we observed that, by applying the selected variables to the classifiers such as naïve Bayes classifiers and support vector machines, the proposed feature selection procedure gives results comparable to those obtained by using class label information.

Topic Modeling Analysis Comparison for Research Topic in Korean Society of Industrial and Systems Engineering: Concentrated on Research Papers from 1978~1999 (한국산업경영시스템학회지 연구 주제의 토픽모델링 분석 비교: 1978년~99년 논문을 중심으로)

  • Park, Dong Joon;Oh, Hyung Sool;Kim, Ho Gyun;Yoon, Min
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.4
    • /
    • pp.113-127
    • /
    • 2021
  • Topic modeling has been receiving much attention in academic disciplines in recent years. Topic modeling is one of the applications in machine learning and natural language processing. It is a statistical modeling procedure to discover topics in the collection of documents. Recently, there have been many attempts to find out topics in diverse fields of academic research. Although the first Department of Industrial Engineering (I.E.) was established in Hanyang university in 1958, Korean Institute of Industrial Engineers (KIIE) which is truly the most academic society was first founded to contribute to research for I.E. and promote industrial techniques in 1974. Korean Society of Industrial and Systems Engineering (KSIE) was established four years later. However, the research topics for KSIE journal have not been deeply examined up until now. Using topic modeling algorithms, we cautiously aim to detect the research topics of KSIE journal for the first half of the society history, from 1978 to 1999. We made use of titles and abstracts in research papers to find out topics in KSIE journal by conducting four algorithms, LSA, HDP, LDA, and LDA Mallet. Topic analysis results obtained by the algorithms were compared. We tried to show the whole procedure of topic analysis in detail for further practical use in future. We employed visualization techniques by using analysis result obtained from LDA. As a result of thorough analysis of topic modeling, eight major research topics were discovered including Production/Logistics/Inventory, Reliability, Quality, Probability/Statistics, Management Engineering/Industry, Engineering Economy, Human Factor/Safety/Computer/Information Technology, and Heuristics/Optimization.

Understanding Black-Scholes Option Pricing Model

  • Lee, Eun-Kyung;Lee, Yoon-Dong
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.2
    • /
    • pp.459-479
    • /
    • 2007
  • Theories related to financial market has received big attention from the statistics community. However, not many courses on the topic are provided in statistics departments. Because the financial theories are entangled with many complicated mathematical and physical theories as well as ambiguously stated financial terminologies. Based on our experience on the topic, we try to explain the rather complicated terminologies and theories with easy-to-understand words. This paper will briefly cover the topics of basic terminologies of derivatives, Black-Scholes pricing idea, and related basic mathematical terminologies.

Analysis of English abstracts in Journal of the Korean Data & Information Science Society using topic models and social network analysis (토픽 모형 및 사회연결망 분석을 이용한 한국데이터정보과학회지 영문초록 분석)

  • Kim, Gyuha;Park, Cheolyong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.1
    • /
    • pp.151-159
    • /
    • 2015
  • This article analyzes English abstracts of the articles published in Journal of the Korean Data & Information Science Society using text mining techniques. At first, term-document matrices are formed by various methods and then visualized by social network analysis. LDA (latent Dirichlet allocation) and CTM (correlated topic model) are also employed in order to extract topics from the abstracts. Performances of the topic models are compared via entropy for several numbers of topics and weighting methods to form term-document matrices.

Topic change monitoring study based on Blue House national petition using a control chart (관리도를 활용한 국민청원 토픽 모니터링 연구)

  • Lee, Heeyeon;Choi, Jieun;Lee, Sungim;Son, Won
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.5
    • /
    • pp.795-806
    • /
    • 2021
  • Recently, as text data through online channels have become vast, there is a growing interest in research that summarizes and analyzes them. One of the fundamental analyses of text data is to extract potential topics. Although the researcher may read all the data and summarize the contents one by one, it is not easy to deal with large amounts of data. Blei and Lafferty (2007) and Blei et al. (2003) proposed topic modeling methods for extracting topics using a statistical model. Since the text data is generally collected over time, it is worthwhile to monitor the topic's changes. In this study, we propose a topic index based on the results of the topic model. In addition, a control chart, a representative tool for statistical process management, is applied to monitor the topic index over time. As a practical example, we use text data collected from Blue House National Petition boards between March 5, 2018, and March 5, 2020.

An Ontology-Based Labeling of Influential Topics Using Topic Network Analysis

  • Kim, Hyon Hee;Rhee, Hey Young
    • Journal of Information Processing Systems
    • /
    • v.15 no.5
    • /
    • pp.1096-1107
    • /
    • 2019
  • In this paper, we present an ontology-based approach to labeling influential topics of scientific articles. First, to look for influential topics from scientific article, topic modeling is performed, and then social network analysis is applied to the selected topic models. Abstracts of research papers related to data mining published over the 20 years from 1995 to 2015 are collected and analyzed in this research. Second, to interpret and to explain selected influential topics, the UniDM ontology is constructed from Wikipedia and serves as concept hierarchies of topic models. Our experimental results show that the subjects of data management and queries are identified in the most interrelated topic among other topics, which is followed by that of recommender systems and text mining. Also, the subjects of recommender systems and context-aware systems belong to the most influential topic, and the subject of k-nearest neighbor classifier belongs to the closest topic to other topics. The proposed framework provides a general model for interpreting topics in topic models, which plays an important role in overcoming ambiguous and arbitrary interpretation of topics in topic modeling.

Trend Analysis of Data Mining Research Using Topic Network Analysis

  • Kim, Hyon Hee;Rhee, Hey Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.5
    • /
    • pp.141-148
    • /
    • 2016
  • In this paper, we propose a topic network analysis approach which integrates topic modeling and social network analysis. We collected 2,039 scientific papers from five top journals in the field of data mining published from 1996 to 2015, and analyzed them with the proposed approach. To identify topic trends, time-series analysis of topic network is performed based on 4 intervals. Our experimental results show centralization of the topic network has the highest score from 1996 to 2000, and decreases for next 5 years and increases again. For last 5 years, centralization of the degree centrality increases, while centralization of the betweenness centrality and closeness centrality decreases again. Also, clustering is identified as the most interrelated topic among other topics. Topics with the highest degree centrality evolves clustering, web applications, clustering and dimensionality reduction according to time. Our approach extracts the interrelationships of topics, which cannot be detected with conventional topic modeling approaches, and provides topical trends of data mining research fields.

Introduction to the History of Statistics Development in Italy

  • Kim, Joo-Hwan
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.2
    • /
    • pp.515-530
    • /
    • 2001
  • Recently Korean statistician have more chance to work with other researcher in other countries at international level. Especially the 53rd Scientist meeting of he International Statistical Institute(ISI) will be held in Seoul, Rep. of Korea at Aug 22-29, 2001. The fields of Statistics in Korea have been affected a lot from American Statistical Society. In this research communication, I would like to introduce a short history of he Italian statistical society and their major research topic and outputs. The contents will help us to understand the Italian statistician, and it can be a conner-stone to the future relationship between Korean statistician and Italian statistician.

  • PDF

Research trends in statistics for domestic and international journal using paper abstract data (초록데이터를 활용한 국내외 통계학 분야 연구동향)

  • Yang, Jong-Hoon;Kwak, Il-Youp
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.267-278
    • /
    • 2021
  • As time goes by, the amount of data is increasing regardless of government, business, domestic or overseas. Accordingly, research on big data is increasing in academia. Statistics is one of the major disciplines of big data research, and it will be interesting to understand the research trend of statistics through big data in the growing number of papers in statistics. In this study, we analyzed what studies are being conducted through abstract data of statistical papers in Korea and abroad. Research trends in domestic and international were analyzed through the frequency of keyword data of the papers, and the relationship between the keywords was visualized through the Word Embedding method. In addition to the keywords selected by the authors, words that are importantly used in statistical papers selected through Textrank were also visualized. Lastly, 10 topics were investigated by applying the LDA technique to the abstract data. Through the analysis of each topic, we investigated which research topics are frequently studied and which words are used importantly.

Sustainability Report Analysis Using Transformer-Based Topic Modeling (Transformer 기반의 토픽 모델링을 이용한 지속가능경영보고서 분석)

  • Lee, Hanwool;Lee, Jihyun;Lee, Junheui
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.05a
    • /
    • pp.464-467
    • /
    • 2022
  • 기업의 사회적 책임에 대한 요구가 높아짐에 따라 기업의 지속 가능 경영 보고서 발간은 증가 추세를 보이고 있다. 그러나 이전까지의 연구는 지속가능성 및 기업의 재무적, 비재무적 연관성에 초점이 맞춰져 있었으며, 전통적인 토픽 모델링 기법만을 제한적으로 사용한다는 한계를 보였다. 본 연구에서는 Transformer 기반의 맥락을 고려한 토픽 모델링 기법을 도입하여 다양한 이해관계자 측면에서 이용 가능한 25 개의 주제를 도출하였다. 또한 동적 토픽 모델링(Dynamic Topic Modeling)을 통해 주제의 변화를 시계열적으로 파악했다.