• Title/Summary/Keyword: Text frequency analysis

Search Result 458, Processing Time 0.024 seconds

An Analysis of Keywords on 'School Space Innovation' Policies using Text Mining - Focused on News Articles - (텍스트 마이닝을 활용한 '학교 공간 혁신' 정책 키워드 분석 - 뉴스 기사를 중심으로 -)

  • Lee, Dongkuk
    • The Journal of Sustainable Design and Educational Environment Research
    • /
    • v.19 no.2
    • /
    • pp.11-20
    • /
    • 2020
  • The goal of this study was to investigate the implementation and related issues of the school space innovation issued by key Korean mass media using text mining. To accomplish this goal, this study collected 519 news articles associated with the school space innovation issued by 54 Korean mass media companies. Based on this data, this study performed the frequency analysis and network analysis regarding the keywords. Based on the findings, the characteristics of school space innovation are summarized as follows: First, school space innovation has progressed in response to future education. Second, users are actively participating in school space innovation. Third, experts are supporting the innovation of school space by establishing a cooperative system. Fourth, the community is actively considering the innovation of school space. Fifth, the main projects of the Ministry of Education and the Provincial Offices of Education are actively conducted in a mix of top-down and bottom-up approaches. The findings of this study will contribute to providing a clear direction for contemporary school space innovation and implications for future research agenda and implementation.

Trends in FTA Research of Domestic and International Journal using Paper Abstract Data (초록데이터를 활용한 국내외 FTA 연구동향: 2000-2020)

  • Hee-Young Yoon;Il-Youp Kwak
    • Korea Trade Review
    • /
    • v.45 no.5
    • /
    • pp.37-53
    • /
    • 2020
  • This study aims to provide the implications of research development by comparing domestic and international studies conducted on the subject of FTA. To this end, among the papers written during the period from 2000 to July 23, 2020, papers whose title is searched by FTA (Free Trade Agreement) were selected as research data. In the case of domestic research, 1,944 searches from the Korean Citation Index (KCI) and 970 from the Web of Science and SCOPUS were selected for international research, and the research trend was analyzed through keywords and abstracts. Frequency analysis and word embedding (Word2vec) were used to analyze the data and visualized using t-SNE and Scattertext. The results of the analysis are as follows. First, in the top 30 keywords of domestic and international research, 16 out of 30 were found to be the same. In domestic research, many studies have been conducted to analyze the outcomes or expected effects of countries that have concluded or discussed FTAs with Korea, on the other hand there are diverse range of study subjects in international research. Second, in the word embedding analysis, t-SNE was used to visually represent the research connection of the top 60 keywords. Finally, Scattertext was used to visually indicate which keywords were frequently used in studies from 2000 to 2010, and from 2011 to 2020. This study is the first to draw implications for academic development through abstract and keyword analysis by applying various text mining approaches to the FTA related research papers. Further in-depth research is needed, including collecting a variety of FTA related text data, comparing and analyzing FTA studies in different countries.

The prediction of the stock price movement after IPO using machine learning and text analysis based on TF-IDF (증권신고서의 TF-IDF 텍스트 분석과 기계학습을 이용한 공모주의 상장 이후 주가 등락 예측)

  • Yang, Suyeon;Lee, Chaerok;Won, Jonggwan;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.237-262
    • /
    • 2022
  • There has been a growing interest in IPOs (Initial Public Offerings) due to the profitable returns that IPO stocks can offer to investors. However, IPOs can be speculative investments that may involve substantial risk as well because shares tend to be volatile, and the supply of IPO shares is often highly limited. Therefore, it is crucially important that IPO investors are well informed of the issuing firms and the market before deciding whether to invest or not. Unlike institutional investors, individual investors are at a disadvantage since there are few opportunities for individuals to obtain information on the IPOs. In this regard, the purpose of this study is to provide individual investors with the information they may consider when making an IPO investment decision. This study presents a model that uses machine learning and text analysis to predict whether an IPO stock price would move up or down after the first 5 trading days. Our sample includes 691 Korean IPOs from June 2009 to December 2020. The input variables for the prediction are three tone variables created from IPO prospectuses and quantitative variables that are either firm-specific, issue-specific, or market-specific. The three prospectus tone variables indicate the percentage of positive, neutral, and negative sentences in a prospectus, respectively. We considered only the sentences in the Risk Factors section of a prospectus for the tone analysis in this study. All sentences were classified into 'positive', 'neutral', and 'negative' via text analysis using TF-IDF (Term Frequency - Inverse Document Frequency). Measuring the tone of each sentence was conducted by machine learning instead of a lexicon-based approach due to the lack of sentiment dictionaries suitable for Korean text analysis in the context of finance. For this reason, the training set was created by randomly selecting 10% of the sentences from each prospectus, and the sentence classification task on the training set was performed after reading each sentence in person. Then, based on the training set, a Support Vector Machine model was utilized to predict the tone of sentences in the test set. Finally, the machine learning model calculated the percentages of positive, neutral, and negative sentences in each prospectus. To predict the price movement of an IPO stock, four different machine learning techniques were applied: Logistic Regression, Random Forest, Support Vector Machine, and Artificial Neural Network. According to the results, models that use quantitative variables using technical analysis and prospectus tone variables together show higher accuracy than models that use only quantitative variables. More specifically, the prediction accuracy was improved by 1.45% points in the Random Forest model, 4.34% points in the Artificial Neural Network model, and 5.07% points in the Support Vector Machine model. After testing the performance of these machine learning techniques, the Artificial Neural Network model using both quantitative variables and prospectus tone variables was the model with the highest prediction accuracy rate, which was 61.59%. The results indicate that the tone of a prospectus is a significant factor in predicting the price movement of an IPO stock. In addition, the McNemar test was used to verify the statistically significant difference between the models. The model using only quantitative variables and the model using both the quantitative variables and the prospectus tone variables were compared, and it was confirmed that the predictive performance improved significantly at a 1% significance level.

Research Trends in Transformational Leadership: Focusing on Domestic Journals Published in 2007-2016 (변혁적 리더십의 연구동향 분석: 최근 10년(2007-2016)간 국내 학술지 중심으로)

  • Haam, ByungWoo;Ko, GeunYeong;Jun, JuSung
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.8
    • /
    • pp.490-505
    • /
    • 2017
  • The purpose of this study was to analyze the research trend of transformational leadership published in domestic journals in the last 10 years and to find some implications for future research. For this purpose, 337 research papers on transformational leadership from 2007 to 2016 were reviewed. This study used descriptive statistics by frequency and percentage and a network text analysis method. The findings of the study are as follows. First, the annual average number of papers published was 33. Second, 'human resource management research' was the most common topic. Third, most of the research subjects were business employees. Fourth, the research method trend analysis showed that the highest proportion was in the quantitative research. Fifth, 'transactional leadership' showed the highest frequency as a result of analyzing the keywords presented in the abstract of the paper. Sixth, as a result of analyzing the network texts, those having the trend of being analyzed with a close connections were 'transactional leadership', which had the highest connection to transformational leadership, showing the closest relationship with 'role satisfaction'.

A study on the current status of DIY clothing products related to fabric using text mining (텍스트마이닝을 활용한 패브릭 관련 DIY 의류 상품 현황 연구)

  • Eun-Hye Lee;Ha-Eun Lee;Jeong-Wook Choi
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.25 no.2
    • /
    • pp.111-122
    • /
    • 2023
  • This study aims to collect Big Data related to DIY clothing, analyze the results on a year-by-year basis, understand consumers' perceptions, the status, and reality of DIY clothing. The reference period for the evaluation of DIY clothing trends was set from 2012 to 2022. The data in this study was collected and analyzed using Textom, a Big Data solution program certified as a Good Software by the Telecommunications Technology Association (TTA). For the analysis of fabric-related DIY products, the keyword was set to "DIY clothing", and for data cleansing following collection, the "Espresso K" module was employed. Also, via data collection on a year-by-year basis, a total of 11 lists were generated and the collected data was analyzed by period. The following are the findings of this study's data collection on DIY clothing. The total number of keywords collected over a period of ten years on search engines "Naver" and "Google" between January 1, 2012 and December 31, 2022 was 16,315, and data trends by period indicate a continuous upward trend. In addition, a keyword analysis was conducted to analyze TF-IDF (Term Frequency-Inverse Document Frequency), a statistical measure that reflects the importance of a word within data, and the relationship with N-gram, an analysis of the correlation concerning the relationship between words. Using these results, it was possible to evaluate the popularity and growing tendency of DIY clothing products in conjunction with the evolving social environment, as well as the desire to explore DIY trends among consumers. Therefore, this study is valuable in that it provides preliminary data for DIY clothing research by analyzing the status and reality of DIY products, and furthermore, contributes to the development and production of DIY clothing.

Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality (지자체 사이버 공간 안전을 위한 금융사기 탐지 텍스트 마이닝 방법)

  • Choi, Sukjae;Lee, Jungwon;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.119-138
    • /
    • 2017
  • Recently, SNS has become an important channel for marketing as well as personal communication. However, cybercrime has also evolved with the development of information and communication technology, and illegal advertising is distributed to SNS in large quantity. As a result, personal information is lost and even monetary damages occur more frequently. In this study, we propose a method to analyze which sentences and documents, which have been sent to the SNS, are related to financial fraud. First of all, as a conceptual framework, we developed a matrix of conceptual characteristics of cybercriminality on SNS and emergency management. We also suggested emergency management process which consists of Pre-Cybercriminality (e.g. risk identification) and Post-Cybercriminality steps. Among those we focused on risk identification in this paper. The main process consists of data collection, preprocessing and analysis. First, we selected two words 'daechul(loan)' and 'sachae(private loan)' as seed words and collected data with this word from SNS such as twitter. The collected data are given to the two researchers to decide whether they are related to the cybercriminality, particularly financial fraud, or not. Then we selected some of them as keywords if the vocabularies are related to the nominals and symbols. With the selected keywords, we searched and collected data from web materials such as twitter, news, blog, and more than 820,000 articles collected. The collected articles were refined through preprocessing and made into learning data. The preprocessing process is divided into performing morphological analysis step, removing stop words step, and selecting valid part-of-speech step. In the morphological analysis step, a complex sentence is transformed into some morpheme units to enable mechanical analysis. In the removing stop words step, non-lexical elements such as numbers, punctuation marks, and double spaces are removed from the text. In the step of selecting valid part-of-speech, only two kinds of nouns and symbols are considered. Since nouns could refer to things, the intent of message is expressed better than the other part-of-speech. Moreover, the more illegal the text is, the more frequently symbols are used. The selected data is given 'legal' or 'illegal'. To make the selected data as learning data through the preprocessing process, it is necessary to classify whether each data is legitimate or not. The processed data is then converted into Corpus type and Document-Term Matrix. Finally, the two types of 'legal' and 'illegal' files were mixed and randomly divided into learning data set and test data set. In this study, we set the learning data as 70% and the test data as 30%. SVM was used as the discrimination algorithm. Since SVM requires gamma and cost values as the main parameters, we set gamma as 0.5 and cost as 10, based on the optimal value function. The cost is set higher than general cases. To show the feasibility of the idea proposed in this paper, we compared the proposed method with MLE (Maximum Likelihood Estimation), Term Frequency, and Collective Intelligence method. Overall accuracy and was used as the metric. As a result, the overall accuracy of the proposed method was 92.41% of illegal loan advertisement and 77.75% of illegal visit sales, which is apparently superior to that of the Term Frequency, MLE, etc. Hence, the result suggests that the proposed method is valid and usable practically. In this paper, we propose a framework for crisis management caused by abnormalities of unstructured data sources such as SNS. We hope this study will contribute to the academia by identifying what to consider when applying the SVM-like discrimination algorithm to text analysis. Moreover, the study will also contribute to the practitioners in the field of brand management and opinion mining.

An exploratory study on consumers' responses to mobile payment service focused on Samsung Pay (텍스트 마이닝 기법을 이용한 모바일 간편결제 서비스에 대한 소비자 반응 분석: 삼성페이를 중심으로)

  • Jung, Minji;Lee, Yu Lim;Yoo, Chae Min;Kim, Ji Won;Chung, Jae-Eun
    • Journal of Digital Convergence
    • /
    • v.17 no.1
    • /
    • pp.9-27
    • /
    • 2019
  • The purpose of this study is to examine consumers' responses to mobile payment services by using a text-mining technique focusing on Samsung Pay as it is used in both online and offline transactions. We conducted text frequency analysis, text clustering analysis, and text network analysis using R programming. The major findings are as follows. First, the most frequently used key words referenced the brand names of the mobile devices, the replacement of traditional wallets and unique functions of Samsung Pay. Second, there was a clear split between positive and negative responses at the macro level. Third, replacement of traditional wallets played a great role in the positive responses and continuous use of mobile payment services. This study provides in-depth understanding of consumer responses toward mobile payment services. It also offers practical implications that may help mobile payment marketers correspond to consumer values and expectations, thus increasing consumer satisfaction.

Big Data Analysis of News on Purchasing Second-hand Clothing and Second-hand Luxury Goods: Identification of Social Perception and Current Situation Using Text Mining (중고의류와 중고명품 구매 관련 언론 보도 빅데이터 분석: 텍스트마이닝을 활용한 사회적 인식과 현황 파악)

  • Hwa-Sook Yoo
    • Human Ecology Research
    • /
    • v.61 no.4
    • /
    • pp.687-707
    • /
    • 2023
  • This study was conducted to obtain useful information on the development of the future second-hand fashion market by obtaining information on the current situation through unstructured text data distributed as news articles related to 'purchase of second-hand clothing' and 'purchase of second-hand luxury goods'. Text-based unstructured data was collected on a daily basis from Naver news from January 1st to December 31st, 2022, using 'purchase of second-hand clothing' and 'purchase of second-hand luxury goods' as collection keywords. This was analyzed using text mining, and the results are as follows. First, looking at the frequency, the collection data related to the purchase of second-hand luxury goods almost quadrupled compared to the data related to the purchase of second-hand clothing, indicating that the purchase of second-hand luxury goods is receiving more social attention. Second, there were common words between the data obtained by the two collection keywords, but they had different words. Regarding second-hand clothing, words related to donations, sharing, and compensation sales were mainly mentioned, indicating that the purchase of second-hand clothing tends to be recognized as an eco-friendly transaction. In second-hand luxury goods, resale and genuine controversy related to the transaction of second-hand luxury goods, second-hand trading platforms, and luxury brands were frequently mentioned. Third, as a result of clustering, data related to the purchase of second-hand clothing were divided into five groups, and data related to the purchase of second-hand luxury goods were divided into six groups.

Analysis of Keywords and Language Networks of Pedagogical Problems in the Secondary-School Teacher's Employment Exam : Focusing on the 2019~2022 School Year Exam

  • Kwon, Choong-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.7
    • /
    • pp.115-124
    • /
    • 2022
  • The purpose of this study is to analyze and present keywords, trends, and language networks of keywords for each year of the pedagogical exam of the secondary teacher's employment exam for the 2019~2022 school year. The main research methods were text mining technique and language network analysis method, and analysis programs were KrKwic, Wordcloud Maker, Ucinet6, NetDraw, etc. The research results are as follows; First, keywords such as teacher, student, curriculum, class, and evaluation appeared in the top rankings, and keywords (online, wiki, discussion ceremony, information, etc.) that reflect the recent online class progress in the current COVID-19 situation also tended to appear. The keywords with high frequency of occurrence in the four-year integrated text were student(44), teacher(39), class(27), school(18), curriculum(16), online(10), and discussion method(8). Second, the overall language network of the keywords with high frequency of 4 years showed a significant level of density(0.566), total number of links(492), and average degree of links(16.4). The degree centrality was found in the order of teacher(199.0), class(197.0), student(185.0), and school(150.0). Betweenness centrality was found in the order of teacher(30.859), class(18.956), student(16.054), and school (15.745). It is expected that the results of this study will serve as data to be considered for preparatory teachers, institutions and related persons, and teachers and administrators of secondary school teacher training institutions.

A Suggestion for Spatiotemporal Analysis Model of Complaints on Officially Assessed Land Price by Big Data Mining (빅데이터 마이닝에 의한 공시지가 민원의 시공간적 분석모델 제시)

  • Cho, Tae In;Choi, Byoung Gil;Na, Young Woo;Moon, Young Seob;Kim, Se Hun
    • Journal of Cadastre & Land InformatiX
    • /
    • v.48 no.2
    • /
    • pp.79-98
    • /
    • 2018
  • The purpose of this study is to suggest a model analysing spatio-temporal characteristics of the civil complaints for the officially assessed land price based on big data mining. Specifically, in this study, the underlying reasons for the civil complaints were found from the spatio-temporal perspectives, rather than the institutional factors, and a model was suggested monitoring a trend of the occurrence of such complaints. The official documents of 6,481 civil complaints for the officially assessed land price in the district of Jung-gu of Incheon Metropolitan City over the period from 2006 to 2015 along with their temporal and spatial poperties were collected and used for the analysis. Frequencies of major key words were examined by using a text mining method. Correlations among mafor key words were studied through the social network analysis. By calculating term frequency(TF) and term frequency-inverse document frequency(TF-IDF), which correspond to the weighted value of key words, I identified the major key words for the occurrence of the civil complaint for the officially assessed land price. Then the spatio-temporal characteristics of the civil complaints were examined by analysing hot spot based on the statistics of Getis-Ord $Gi^*$. It was found that the characteristic of civil complaints for the officially assessed land price were changing, forming a cluster that is linked spatio-temporally. Using text mining and social network analysis method, we could find out that the occurrence reason of civil complaints for the officially assessed land price could be identified quantitatively based on natural language. TF and TF-IDF, the weighted averages of key words, can be used as main explanatory variables to analyze spatio-temporal characteristics of civil complaints for the officially assessed land price since these statistics are different over time across different regions.