• Title/Summary/Keyword: Quantitative Text Analysis

Search Result 144, Processing Time 0.036 seconds

Development and Application of Image Analysis Program for Investigation of Pore Characteristics in Transverse Surface of Hardwoods

  • Kwon, Oh-Kyung;Lee, Phil-Woo
    • Journal of the Korean Wood Science and Technology
    • /
    • v.26 no.2
    • /
    • pp.29-37
    • /
    • 1998
  • An image analysis program with the function of measuring various quantitative characteristics in the transverse surface of wood was developed using Delphi 2.0. Data on pore characteristics (conditions for image processing, proportion of pores in relationship to other elements, tangential diameter, area, tangential and radial diameter, x and y coordinates of pore center, and geometric coefficients) were saved in text file format. In addition, the pore area histogram in the tangential and radial directions was saved as a BMP (bitmap) type file. Analyses indicated that quantitative characteristics such as the relative radial distribution of pores in a growth ring, pore tangential area histogram, and proportion of pore in lumen area appear to be useful in separating four diffuse-porous woods and four ring-porous woods on the species level.

  • PDF

Development of Online Fashion Thesaurus and Taxonomy for Text Mining (텍스트마이닝을 위한 패션 속성 분류체계 및 말뭉치 웹사전 구축)

  • Seyoon Jang;Ha Youn Kim;Songmee Kim;Woojin Choi;Jin Jeong;Yuri Lee
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.46 no.6
    • /
    • pp.1142-1160
    • /
    • 2022
  • Text data plays a significant role in understanding and analyzing trends in consumer, business, and social sectors. For text analysis, there must be a corpus that reflects specific domain knowledge. However, in the field of fashion, the professional corpus is insufficient. This study aims to develop a taxonomy and thesaurus that considers the specialty of fashion products. To this end, about 100,000 fashion vocabulary terms were collected by crawling text data from WSGN, Pantone, and online platforms; text subsequently was extracted through preprocessing with Python. The taxonomy was composed of items, silhouettes, details, styles, colors, textiles, and patterns/prints, which are seven attributes of clothes. The corpus was completed through processing synonyms of terms from fashion books such as dictionaries. Finally, 10,294 vocabulary words, including 1,956 standard Korean words, were classified in the taxonomy. All data was then developed into a web dictionary system. Quantitative and qualitative performance tests of the results were conducted through expert reviews. The performance of the thesaurus also was verified by comparing the results of text mining analysis through the previously developed corpus. This study contributes to achieving a text data standard and enables meaningful results of text mining analysis in the fashion field.

Financial Footnote Analysis for Financial Ratio Predictions based on Text-Mining Techniques (재무제표 주석의 텍스트 분석 통한 재무 비율 예측 향상 연구)

  • Choe, Hyoung-Gyu;Lee, Sang-Yong Tom
    • Knowledge Management Research
    • /
    • v.21 no.2
    • /
    • pp.177-196
    • /
    • 2020
  • Since the adoption of K-IFRS(Korean International Financial Reporting Standards), the amount of financial footnotes has been increased. However, due to the stereotypical phrase and the lack of conciseness, deriving the core information from footnotes is not really easy yet. To propose a solution for this problem, this study tried financial footnote analysis for financial ratio predictions based on text-mining techniques. Using the financial statements data from 2013 to 2018, we tried to predict the earning per share (EPS) of the following quarter. We found that measured prediction errors were significantly reduced when text-mined footnotes data were jointly used. We believe this result came from the fact that discretionary financial figures, which were hardly predicted with quantitative financial data, were more correlated with footnotes texts.

Measuring the Confidence of Human Disaster Risk Case based on Text Mining (텍스트마이닝 기반의 인적재난사고사례 신뢰도 측정연구)

  • Lee, Young-Jai;Lee, Sung-Soo
    • The Journal of Information Systems
    • /
    • v.20 no.3
    • /
    • pp.63-79
    • /
    • 2011
  • Deducting the risk level of infrastructure and buildings based on past human disaster risk cases and implementing prevention measures are important activities for disaster prevention. The object of this study is to measure the confidence to proceed quantitative analysis of various disaster risk cases through text mining methodology. Indeed, by examining confidence calculation process and method, this study suggests also a basic quantitative framework. The framework to measure the confidence is composed into four stages. First step describes correlation by categorizing basic elements based on human disaster ontology. Secondly, terms and cases of Term-Document Matrix will be created and the frequency of certain cases and terms will be quantified, the correlation value will be added to the missing values. In the third stage, association rules will be created according to the basic elements of human disaster risk cases. Lastly, the confidence value of disaster risk cases will be measured through association rules. This kind of confidence value will become a key element when deciding a risk level of a new disaster risk, followed up by preventive measures. Through collection of human disaster risk cases related to road infrastructure, this study will demonstrate a case where the four steps of the quantitative framework and process had been actually used for verification.

Analysis of Educational Issues through Topic Modeling of National Petitions Text (국민청원글의 토픽 모델링을 통한 교육이슈 분석)

  • Shim, Jaekwoun
    • Journal of The Korean Association of Information Education
    • /
    • v.25 no.4
    • /
    • pp.633-640
    • /
    • 2021
  • Education related issues are social problems in which various groups and situations are intricately linked to each other. It is difficult to find issues by analyzing social phenomena related to education. Korean based text analysis can be analyzed in a quantitative. With the development of text analysis techniques, research results have been recently achieved, and it can be fully utilized to derive educational issues from text data in Korean. In this study, petition articles in the field of childcare/education were collected on the online-board of the Blue House National Petition website, and text analysis was used to derive issues in the education world. The analysis derived 6 topics through Latent Dirichlet Allocation(LDA) among topic modeling techniques. The association rules of major keywords were analyzed and visualized as graphs. In addition to deriving educational issues through the existing questionnaire, it can provide implications for future research directions and policies in that issues can be sufficiently discovered through text-based analysis methods.

Research trends in the Korean Journal of Women Health Nursing from 2011 to 2021: a quantitative content analysis

  • Ju-Hee Nho;Sookkyoung Park
    • Women's Health Nursing
    • /
    • v.29 no.2
    • /
    • pp.128-136
    • /
    • 2023
  • Purpose: Topic modeling is a text mining technique that extracts concepts from textual data and uncovers semantic structures and potential knowledge frameworks within context. This study aimed to identify major keywords and network structures for each major topic to discern research trends in women's health nursing published in the Korean Journal of Women Health Nursing (KJWHN) using text network analysis and topic modeling. Methods: The study targeted papers with English abstracts among 373 articles published in KJWHN from January 2011 to December 2021. Text network analysis and topic modeling were employed, and the analysis consisted of five steps: (1) data collection, (2) word extraction and refinement, (3) extraction of keywords and creation of networks, (4) network centrality analysis and key topic selection, and (5) topic modeling. Results: Six major keywords, each corresponding to a topic, were extracted through topic modeling analysis: "gynecologic neoplasms," "menopausal health," "health behavior," "infertility," "women's health in transition," and "nursing education for women." Conclusion: The latent topics from the target studies primarily focused on the health of women across all age groups. Research related to women's health is evolving with changing times and warrants further progress in the future. Future research on women's health nursing should explore various topics that reflect changes in social trends, and research methods should be diversified accordingly.

A Study on Research Trends of Age-Friendly Using Text Network Analysis : Focusing on 「The Korean Journal of Health Service Management」 (2007-2018) (텍스트 네트워크 분석을 활용한 고령친화 분야의 연구동향 분석 : 「보건의료산업학회지」 게재논문(2007~2018)을 중심으로)

  • Ko, Min-Seok
    • The Korean Journal of Health Service Management
    • /
    • v.13 no.4
    • /
    • pp.19-31
    • /
    • 2019
  • Objectives: The purpose of this study was to analyze research trends in age-friendly research and suggest directions for future research. Methods: For this study, 112 articles related to age-friendly research were selected, from 605 published articles in The Korean Journal of Health Service Management (2007-2018). Content analysis and text network analysis were conducted using SPSS 23.0 and NetMiner 4. Results: First, 2 authors (30.4%) and 4 keywords (45.5%) were the most studied. Most of the studies used quantitative research (93.8%). Primary data (61.9%) and SPSS (77.7%) were the most used for analysis. Second, there were seven common keywords in the top 10 in all the centralities. They were Elderly, Geriatric Hospital, Depression, Care Workers, Long-Term Care Facilities, Experience, and Attitude. Conclusions: This study shows the need for diversity of research topics, subjects, research methods, and analytical tools in future age-friendly related studies. In addition, it suggests activating convergence research in this field linked to various industries and services.

A Study on the Quantitative Evaluation of Initial Coin Offering (ICO) Using Unstructured Data (비정형 데이터를 이용한 ICO(Initial Coin Offering) 정량적 평가 방법에 대한 연구)

  • Lee, Han Sol;Ahn, Sangho;Kang, Juyoung
    • Smart Media Journal
    • /
    • v.11 no.5
    • /
    • pp.63-74
    • /
    • 2022
  • Initial public offering (IPO) has a legal framework for investor protection, and because there are various quantitative evaluation factors, objective analysis is possible, and various studies have been conducted. In addition, crowdfunding also has several devices to prevent indiscriminate funding as the legal system for investor protection. On the other hand, the blockchain-based cryptocurrency white paper (ICO), which has recently been in the spotlight, has ambiguous legal means and standards to protect investors and lacks quantitative evaluation methods to evaluate ICOs objectively. Therefore, this study collects online-published ICO white papers to detect fraud in ICOs, performs ICO fraud predictions based on BERT, a text embedding technique, and compares them with existing Random Forest machine learning techniques, and shows the possibility on fraud detection. Finally, this study is expected to contribute to the study of ICO fraud detection based on quantitative methods by presenting the possibility of using a quantitative approach using unstructured data to identify frauds in ICOs.

The prediction of the stock price movement after IPO using machine learning and text analysis based on TF-IDF (증권신고서의 TF-IDF 텍스트 분석과 기계학습을 이용한 공모주의 상장 이후 주가 등락 예측)

  • Yang, Suyeon;Lee, Chaerok;Won, Jonggwan;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.237-262
    • /
    • 2022
  • There has been a growing interest in IPOs (Initial Public Offerings) due to the profitable returns that IPO stocks can offer to investors. However, IPOs can be speculative investments that may involve substantial risk as well because shares tend to be volatile, and the supply of IPO shares is often highly limited. Therefore, it is crucially important that IPO investors are well informed of the issuing firms and the market before deciding whether to invest or not. Unlike institutional investors, individual investors are at a disadvantage since there are few opportunities for individuals to obtain information on the IPOs. In this regard, the purpose of this study is to provide individual investors with the information they may consider when making an IPO investment decision. This study presents a model that uses machine learning and text analysis to predict whether an IPO stock price would move up or down after the first 5 trading days. Our sample includes 691 Korean IPOs from June 2009 to December 2020. The input variables for the prediction are three tone variables created from IPO prospectuses and quantitative variables that are either firm-specific, issue-specific, or market-specific. The three prospectus tone variables indicate the percentage of positive, neutral, and negative sentences in a prospectus, respectively. We considered only the sentences in the Risk Factors section of a prospectus for the tone analysis in this study. All sentences were classified into 'positive', 'neutral', and 'negative' via text analysis using TF-IDF (Term Frequency - Inverse Document Frequency). Measuring the tone of each sentence was conducted by machine learning instead of a lexicon-based approach due to the lack of sentiment dictionaries suitable for Korean text analysis in the context of finance. For this reason, the training set was created by randomly selecting 10% of the sentences from each prospectus, and the sentence classification task on the training set was performed after reading each sentence in person. Then, based on the training set, a Support Vector Machine model was utilized to predict the tone of sentences in the test set. Finally, the machine learning model calculated the percentages of positive, neutral, and negative sentences in each prospectus. To predict the price movement of an IPO stock, four different machine learning techniques were applied: Logistic Regression, Random Forest, Support Vector Machine, and Artificial Neural Network. According to the results, models that use quantitative variables using technical analysis and prospectus tone variables together show higher accuracy than models that use only quantitative variables. More specifically, the prediction accuracy was improved by 1.45% points in the Random Forest model, 4.34% points in the Artificial Neural Network model, and 5.07% points in the Support Vector Machine model. After testing the performance of these machine learning techniques, the Artificial Neural Network model using both quantitative variables and prospectus tone variables was the model with the highest prediction accuracy rate, which was 61.59%. The results indicate that the tone of a prospectus is a significant factor in predicting the price movement of an IPO stock. In addition, the McNemar test was used to verify the statistically significant difference between the models. The model using only quantitative variables and the model using both the quantitative variables and the prospectus tone variables were compared, and it was confirmed that the predictive performance improved significantly at a 1% significance level.