• Title/Summary/Keyword: Text Mining for Korean

Search Result 638, Processing Time 0.025 seconds

A Study on Stock Trend Determination in Stock Trend Prediction

  • Lim, Chungsoo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.12
    • /
    • pp.35-44
    • /
    • 2020
  • In this study, we analyze how stock trend determination affects trend prediction accuracy. In stock markets, successful investment requires accurate stock price trend prediction. Therefore, a volume of research has been conducted to improve the trend prediction accuracy. For example, information extracted from SNS (social networking service) and news articles by text mining algorithms is used to enhance the prediction accuracy. Moreover, various machine learning algorithms have been utilized. However, stock trend determination has not been properly analyzed, and conventionally used methods have been employed repeatedly. For this reason, we formulate the trend determination as a moving average-based procedure and analyze its impact on stock trend prediction accuracy. The analysis reveals that trend determination makes prediction accuracy vary as much as 47% and that prediction accuracy is proportional to and inversely proportional to reference window size and target window size, respectively.

A Study on Correlation Analysis of One-Person Housing Space Design Convergence Contents by Using Social Network Analysis (소셜 네트워크 분석 방법론을 활용한 1인 주거공간디자인 융합콘텐츠 상관관계 분석)

  • Park, Eun Soo;Kim, Ji Eun
    • Korea Science and Art Forum
    • /
    • v.34
    • /
    • pp.133-148
    • /
    • 2018
  • Korea's housing structure is predicted that one-person housing will be the most common type of housing in Korea. Therefore, this study intends to derive contents for designing a one-person housing space considering the life of a rapidly increasing one-person householder. For this purpose, this study objectively derives the social, economic and cultural influencing factors of one-person households through big data analysis, and analyzed the correlation between contents using social network analysis methodology. In this paper, 60 core contents related to one person housing space were derived by applying big data analysis methodology. And through social network analysis, the most influential contents were derived from the space editing and space composition categories. This means that the residential space is an important part of the design idea that can flexibly respond to changes in the user's life. Based on this study, future research will focus on the concept and design methodology of one-person housing space.

An Analysis of Artificial Intelligence Education Research Trends Based on Topic Modeling

  • You-Jung Ko
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.2
    • /
    • pp.197-209
    • /
    • 2024
  • This study aimed to analyze recent research trends in Artificial Intelligence (AI) education within South Korea with the overarching objective of exploring the future direction of AI education. For this purpose, an analysis of 697 papers related to AI education published in Research Information Sharing Service (RISS) from 2016 to November 2023 were analyzed using word cloud and Latent Dirichlet Allocation (LDA) topic modeling technique. As a result of the analysis, six major topics were identified: generative AI utilization education, AI ethics education, AI convergence education, teacher perceptions and roles in AI utilization, AI literacy development in university education, and AI-based education and research directions. Based on these findings, I proposed several suggestions, (1) including expanding the use of generative AI in various subjects, (2) establishing ethical guidelines for AI use, (3) evaluating the long-term impact of AI education, (4) enhancing teachers' ability to use AI in higher education, (5) diversifying the curriculum of AI education in universities, (6) analyzing the trend of AI research, and developing an educational platform.

Semantic Dependency Link Topic Model for Biomedical Acronym Disambiguation (의미적 의존 링크 토픽 모델을 이용한 생물학 약어 중의성 해소)

  • Kim, Seonho;Yoon, Juntae;Seo, Jungyun
    • Journal of KIISE
    • /
    • v.41 no.9
    • /
    • pp.652-665
    • /
    • 2014
  • Many important terminologies in biomedical text are expressed as abbreviations or acronyms. We newly suggest a semantic link topic model based on the concepts of topic and dependency link to disambiguate biomedical abbreviations and cluster long form variants of abbreviations which refer to the same senses. This model is a generative model inspired by the latent Dirichlet allocation (LDA) topic model, in which each document is viewed as a mixture of topics, with each topic characterized by a distribution over words. Thus, words of a document are generated from a hidden topic structure of a document and the topic structure is inferred from observable word sequences of document collections. In this study, we allow two distinct word generation to incorporate semantic dependencies between words, particularly between expansions (long forms) of abbreviations and their sentential co-occurring words. Besides topic information, the semantic dependency between words is defined as a link and a new random parameter for the link presence is assigned to each word. As a result, the most probable expansions with respect to abbreviations of a given abstract are decided by word-topic distribution, document-topic distribution, and word-link distribution estimated from document collection though the semantic dependency link topic model. The abstracts retrieved from the MEDLINE Entrez interface by the query relating 22 abbreviations and their 186 expansions were used as a data set. The link topic model correctly predicted expansions of abbreviations with the accuracy of 98.30%.

Research Suggestion for Disaster Prediction using Safety Report of Korea Government (안전신문고를 이용한 재난 예측 방법론 제안)

  • Lee, Jun;Shin, Jindong;Cho, Sangmyeong;Lee, Sanghwa
    • Journal of Korean Society of Disaster and Security
    • /
    • v.12 no.4
    • /
    • pp.15-26
    • /
    • 2019
  • Anjunshinmungo (The safety e-report) has been in operation since 2014, and there are about 1 million cumulative reports by June 2019. This study analyzes the contents of more than 1 million safety newspapers reported at the present time of information age to determine how powerful and meaningful the people's voice and interest are. In particular, we are interested in forecasting ability. We wanted to check whether the report of the safety newspaper was related to possible disasters. To this end, the researchers received data reported in the safety newspaper as text and analyzed it by natural language analysis methodology. Based on this, the newspaper articles during the analysis of the safety newspaper were analyzed, and the correlation between the contents of the newspaper and the newspaper was analyzed. As a result, accidents occurred within a few months as the number of reports related to response and confirmation increased, and analyzing the contents of safety reports previously reported on social instability can be used to predict future disasters.

Principal Components Self-Organizing Map PC-SOM (주성분 자기조직화 지도 PC-SOM)

  • 허명회
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.2
    • /
    • pp.321-333
    • /
    • 2003
  • Self-organizing map (SOM), a unsupervised learning neural network, has been developed by T. Kohonen since 1980's. Main application areas were pattern recognition and text retrieval. Because of that, it has not been spread to statisticians until late. Recently, SOM's are frequently drawn in data mining fields. Kohonen's SOM, however, needs improvements to become a statistician's standard tool. First, there should be a good guideline as for the size of map. Second, an enhanced visualization mode is wanted. In this study, principal components self-organizing map (PC-SOM), a modification of Kohonen's SOM, is proposed to meet such needs. PC-SOM performs one-dimensional SOM during the first stage to decompose input units into node weights and residuals. At the second stage, another one-dimensional SOM is applied to the residuals of the first stage. Finally, by putting together two stages, one obtains two-dimensional SOM. Such procedure can be easily expanded to construct three or more dimensional maps. The number of grid lines along the second axis is determined automatically, once that of the first axis is given by the data analyst. Furthermore, PC-SOM provides easily interpretable map axes. Such merits of PC-SOM are demonstrated with well-known Fisher's iris data and a simulated data set.

A Study on the Analysis of Intellectual Structure of Korean Veterinary Sciences (국내 수의과학 분야의 지적 구조 분석에 관한 연구)

  • Cho, Hyun-Yang
    • Journal of Information Management
    • /
    • v.43 no.2
    • /
    • pp.43-66
    • /
    • 2012
  • The purpose of this study is to see the intellectual structure in the field of veterinary sciences in Korea, using author profiling analysis(APA), a bibliometric approach. Three journals are selected on the basis of citation data, exchanging most citations with Korean Journal of Veterinary. And then, 50 authors who published most articles at selected journals during the given period of time were chosen. The analysis of similarity and dissimilarity among authors by comparing co-word appearance patterns from article title, abstracts, and keywords was made. Authors can be grouped 11 minor clusters under 4 major clusters, depending on their interests in the area of veterinary sciences in Korea. The subjects for each cluster at the veterinary sciences are decided by the matching the keyword, representing author's research interest. As a result, it is possible to figure out the current research trends and the researcher network in the field of veterinary sciences.

Research trends in statistics for domestic and international journal using paper abstract data (초록데이터를 활용한 국내외 통계학 분야 연구동향)

  • Yang, Jong-Hoon;Kwak, Il-Youp
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.267-278
    • /
    • 2021
  • As time goes by, the amount of data is increasing regardless of government, business, domestic or overseas. Accordingly, research on big data is increasing in academia. Statistics is one of the major disciplines of big data research, and it will be interesting to understand the research trend of statistics through big data in the growing number of papers in statistics. In this study, we analyzed what studies are being conducted through abstract data of statistical papers in Korea and abroad. Research trends in domestic and international were analyzed through the frequency of keyword data of the papers, and the relationship between the keywords was visualized through the Word Embedding method. In addition to the keywords selected by the authors, words that are importantly used in statistical papers selected through Textrank were also visualized. Lastly, 10 topics were investigated by applying the LDA technique to the abstract data. Through the analysis of each topic, we investigated which research topics are frequently studied and which words are used importantly.

Exploring Domestic ESG Research Trends: Focusing on Domestic Research on ESG from 2012 to 2021 (국내 ESG 연구동향 탐색: 2012~2021년 진행된 국내 학술연구 중심으로)

  • Park, Jae Hyun;Han, Hyang Won;Kim, Na Ra
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.17 no.1
    • /
    • pp.191-211
    • /
    • 2022
  • As the value of highly sustainable companies increases, ESG(Environmental, Social, and Governance) has emerged as the biggest topic of discussion for companies around the world. In addition, as domestically, more research is being done on ESG in line with global trends, it is necessary to examine ESG research trends. Accordingly, ESG academic papers that have been published for the past 10 years were collected for each year, and frequency analysis was conducted using text mining techniques regarding key themes and thesis titles. This paper analyzed the number of selected publications by year and the cumulated number of studies through bibliometric analysis. The findings suggested that the number of ESG papers is increasing each year and that academic interest in ESG-related issues continues to abound. Next, according to the results of frequency analysis of the keywords and titles of the research papers, the words- "ESG", "company", "society", "responsibility", "management", "investment", and "sustainability"- were extracted. This analysis identified the research fields and keywords that have been relevant to ESG in the past 10 years. As a result of comparing the major ESG issues presented in recent overseas studies and the common factors of the ESG key keywords presented in this study, it was confirmed that the environment is the focus of recent studies compared to previous studies. Third, it was found that the data used by domestic ESG studies mainly include the KEJI index, the KRX index, and the KCGS ESG evaluation index. After identifying the main research subjects of ESG papers, research found that 8 out of 152 domestic ESG studies were focused on SMEs. Through this study, it was possible to confirm the ESG research trend and increase in research, and future researchers divided the research topics and research keywords and presented basic data for selecting more diverse research topics. Based on both, the arguments of previous ESG studies conducted on SMEs and the results of this study, there is a lack of studies on guidelines for ESG practice and their application to SMEs, and more ESG research regarding SMEs will need to be conducted in the future.

The Utilization of Local Document Information to Improve Statistical Context-Sensitive Spelling Error Correction (통계적 문맥의존 철자오류 교정 기법의 향상을 위한 지역적 문서 정보의 활용)

  • Lee, Jung-Hun;Kim, Minho;Kwon, Hyuk-Chul
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.7
    • /
    • pp.446-451
    • /
    • 2017
  • The statistical context-sensitive spelling correction technique in this thesis is based upon Shannon's noisy channel model. The interpolation method is used for the improvement of the correction method proposed in the paper, and the general interpolation method is to fill the middle value of the probability by (N-1)-gram and (N-2)-gram. This method is based upon the same statistical corpus. In the proposed method, interpolation is performed using the frequency information between the statistical corpus and the correction document. The advantages of using frequency of correction documents are twofold. First, the probability of the coined word existing only in the correction document can be obtained. Second, even if there are two correction candidates with ambiguous probability values, the ambiguity is solved by correcting them by referring to the correction document. The method proposed in this thesis showed better precision and recall than the existing correction model.