• Title/Summary/Keyword: TextMining

Search Result 1,563, Processing Time 0.031 seconds

Group-wise Keyword Extraction of the External Audit using Text Mining and Association Rules (텍스트마이닝과 연관규칙을 이용한 외부감사 실시내용의 그룹별 핵심어 추출)

  • Seong, Yoonseok;Lee, Donghee;Jung, Uk
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.1
    • /
    • pp.77-89
    • /
    • 2022
  • Purpose: In order to improve the audit quality of a company, an in-depth analysis is required to categorize the audit report in the form of a text document containing the details of the external audit. This study introduces a systematic methodology to extract keywords for each group that determines the differences between groups such as 'audit plan' and 'interim audit' using audit reports collected in the form of text documents. Methods: The first step of the proposed methodology is to preprocess the document through text mining. In the second step, the documents are classified into groups using machine learning techniques and based on this, important vocabularies that have a dominant influence on the performance of classification are extracted. In the third step, the association rules for each group's documents are found. In the last step, the final keywords for each group representing the characteristics of each group are extracted by comparing the important vocabulary for classification with the important vocabulary representing the association rules of each group. Results: This study quantitatively calculates the importance value of the vocabulary used in the audit report based on machine learning rather than the qualitative research method such as the existing literature search, expert evaluation, and Delphi technique. From the case study of this study, it was found that the extracted keywords describe the characteristics of each group well. Conclusion: This study is meaningful in that it has laid the foundation for quantitatively conducting follow-up studies related to key vocabulary in each stage of auditing.

Arabic Text Clustering Methods and Suggested Solutions for Theme-Based Quran Clustering: Analysis of Literature

  • Bsoul, Qusay;Abdul Salam, Rosalina;Atwan, Jaffar;Jawarneh, Malik
    • Journal of Information Science Theory and Practice
    • /
    • v.9 no.4
    • /
    • pp.15-34
    • /
    • 2021
  • Text clustering is one of the most commonly used methods for detecting themes or types of documents. Text clustering is used in many fields, but its effectiveness is still not sufficient to be used for the understanding of Arabic text, especially with respect to terms extraction, unsupervised feature selection, and clustering algorithms. In most cases, terms extraction focuses on nouns. Clustering simplifies the understanding of an Arabic text like the text of the Quran; it is important not only for Muslims but for all people who want to know more about Islam. This paper discusses the complexity and limitations of Arabic text clustering in the Quran based on their themes. Unsupervised feature selection does not consider the relationships between the selected features. One weakness of clustering algorithms is that the selection of the optimal initial centroid still depends on chances and manual settings. Consequently, this paper reviews literature about the three major stages of Arabic clustering: terms extraction, unsupervised feature selection, and clustering. Six experiments were conducted to demonstrate previously un-discussed problems related to the metrics used for feature selection and clustering. Suggestions to improve clustering of the Quran based on themes are presented and discussed.

Methodology for Issue-related R&D Keywords Packaging Using Text Mining (텍스트 마이닝 기반의 이슈 관련 R&D 키워드 패키징 방법론)

  • Hyun, Yoonjin;Shun, William Wong Xiu;Kim, Namgyu
    • Journal of Internet Computing and Services
    • /
    • v.16 no.2
    • /
    • pp.57-66
    • /
    • 2015
  • Considerable research efforts are being directed towards analyzing unstructured data such as text files and log files using commercial and noncommercial analytical tools. In particular, researchers are trying to extract meaningful knowledge through text mining in not only business but also many other areas such as politics, economics, and cultural studies. For instance, several studies have examined national pending issues by analyzing large volumes of text on various social issues. However, it is difficult to provide successful information services that can identify R&D documents on specific national pending issues. While users may specify certain keywords relating to national pending issues, they usually fail to retrieve appropriate R&D information primarily due to discrepancies between these terms and the corresponding terms actually used in the R&D documents. Thus, we need an intermediate logic to overcome these discrepancies, also to identify and package appropriate R&D information on specific national pending issues. To address this requirement, three methodologies are proposed in this study-a hybrid methodology for extracting and integrating keywords pertaining to national pending issues, a methodology for packaging R&D information that corresponds to national pending issues, and a methodology for constructing an associative issue network based on relevant R&D information. Data analysis techniques such as text mining, social network analysis, and association rules mining are utilized for establishing these methodologies. As the experiment result, the keyword enhancement rate by the proposed integration methodology reveals to be about 42.8%. For the second objective, three key analyses were conducted and a number of association rules between national pending issue keywords and R&D keywords were derived. The experiment regarding to the third objective, which is issue clustering based on R&D keywords is still in progress and expected to give tangible results in the future.

Analysis of Research Trends Related to Start-Up Using Text Mining (텍스트마이닝을 이용한 창업 관련 연구 동향 분석)

  • Han, Sung-Soo;Yang, Dong-Woo
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.12 no.5
    • /
    • pp.1-12
    • /
    • 2017
  • The purpose of this study is to investigate the trends of the start-up research in Korea. To accomplish this, meta-analysis was carried out using text mining methodology by dividing the entrepreneur-related master's and doctoral theses registered in RISS into the first term of entrepreneurship research by 2009 and the second term of entrepreneurship research from 2010. As a result of this study, it can be seen from the three different analysis that the entrepreneurship education and government policy and support are the subject of continuous research topics in the whole period and that the researches on small business start-ups have been studied continuously and conducted more in the second half. In addition, empirical analysis is strengthened in the latter stage of entrepreneurial research. The TF-IDF analysis reveals that many researches on veterans have been carried out in the field of entrepreneurship research, and in the latter period, it was found that many studies related to the elderly were conducted with cultural contents and aging society. In addition, research on brand-related research has been carried out throughout the entire period, and research on venture-related research, characteristics of entrepreneurs, entrepreneurship motivation and start-up strategy have been conducted a lot and female entrepreneurship was also studied. In the latter period, we have emphasized entrepreneurial achievements and found that research on start-ups such as industry-academia cooperation, start-up investment, and social enterprise diversified. This study is meaningful to apply the method which is becoming a recent issue such as text mining and topic analysis to the meta-analysis related to start-up. Future research will need to be undertaken on a variety of more detailed topics related to entrepreneurship.

  • PDF

Text Mining-Based Analysis for Research Trends in Vocational Studies (텍스트 마이닝을 활용한 직업학 연구동향 분석)

  • Yook, Dong-In
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.3
    • /
    • pp.586-599
    • /
    • 2017
  • This study attempts to understand the overall research trends in Vocational Studies using a text mining method, which is a means to analyze big data. The findings of the research show that Vocational Studies in Korea has been directly influenced by global economic crises, as evidenced by its exponential growth after the 1997 foreign exchange crisis that resulted in a bailout from the IMF. In addition, the topics of research have been shifting from such macro subjects as government policies and systems to such micro topics as individual career development. Moreover, the perspective of research is being moved from the socially vulnerable, including women and the disabled, to the economically marginalized, including retirees and the unemployed. As for the research targets, college students overwhelmingly outnumbered primary and secondary school students. However, few cases analyzed the clinical outcomes of career counseling or attempted to process job information and study the history of jobs. This research is limited in that it only analyzed journal abstracts. Nonetheless, it is meaningful because it used topic analysis, one of the text mining methods, to give a complete enumeration of all articles available for search, thereby crafting a framework of quantitative analysis methodology for Vocational Studies. It is also significant in that it is the first attempt to analyze themes in every stage of the development of Vocational Studies.

A Study on Environmental research Trends by Information and Communications Technologies using Text-mining Technology (텍스트 마이닝 기법을 이용한 환경 분야의 ICT 활용 연구 동향 분석)

  • Park, Boyoung;Oh, Kwan-Young;Lee, Jung-Ho;Yoon, Jung-Ho;Lee, Seung Kuk;Lee, Moung-Jin
    • Korean Journal of Remote Sensing
    • /
    • v.33 no.2
    • /
    • pp.189-199
    • /
    • 2017
  • Thisstudy quantitatively analyzed the research trendsin the use ofICT ofthe environmental field using the text mining technique. To that end, the study collected 359 papers published in the past two decades(1996-2015)from the National Digital Science Library (NDSL) using 38 environment-related keywords and 16 ICT-related keywords. It processed the natural languages of the environment and ICT fields in the papers and reorganized the classification system into the unit of corpus. It conducted the text mining analysis techniques of frequency analysis, keyword analysis and the association rule analysis of keywords, based on the above-mentioned keywords of the classification system. As a result, the frequency of the keywords of 'general environment' and 'climate' accounted for 77 % of the total proportion and the keywords of 'public convergence service' and 'industrial convergence service' in the ICT field took up approximately 30 % of the total proportion. According to the time series analysis, the researches using ICT in the environmental field rapidly increased over the past 5 years (2011-2015) and the number of such researches more than doubled compared to the past (1996-2010). Based on the environmental field with generated association rules among the keywords, it was identified that the keyword 'general environment' was using 16 ICT-based technologies and 'climate' was using 14 ICT-based technologies.

Analyzing Disaster Response Terminologies by Text Mining and Social Network Analysis (텍스트 마이닝과 소셜 네트워크 분석을 이용한 재난대응 용어분석)

  • Kang, Seong Kyung;Yu, Hwan;Lee, Young Jai
    • Information Systems Review
    • /
    • v.18 no.1
    • /
    • pp.141-155
    • /
    • 2016
  • This study identified terminologies related to the proximity and frequency of disaster by social network analysis (SNA) and text mining, and then expressed the outcome into a mind map. The termdocument matrix of text mining was utilized for the terminology proximity analysis, and the SNA closeness centrality was calculated to organically express the relationship of the terminologies through a mind map. By analyzing terminology proximity and selecting disaster response-related terminologies, this study identified the closest field among all the disaster response fields to disaster response and the core terms in each disaster response field. This disaster response terminology analysis could be utilized in future core term-based terminology standardization, disaster-related knowledge accumulation and research, and application of various response scenario compositions, among others.

A Trend Analysis of Agricultural and Food Marketing Studies Using Text-mining Technique (텍스트마이닝 기법을 이용한 국내 농식품유통 연구동향 분석)

  • Yoo, Li-Na;Hwang, Su-Chul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.10
    • /
    • pp.215-226
    • /
    • 2017
  • This study analyzed trends in agricultural and food marketing studies from 1984 to 2015 using text-mining techniques. Text-mining is a part of Big-data analysis, which is an effective tool to objectively process large amounts of information based on categorization and trend analysis. In the present study, frequency analysis, topic analysis and association rules were conducted. Titles of agricultural and food marketing studies in four journals and reports were used for placing the analysis. The results showed that 1,126 total theses related to agricultural and food marketing could be categorized into six subjects. There were significant changes in research trends before and after the 2000s. While research before 2000s focused on farm and wholesale level marketing, research after the 2000s mainly covered consumption, (processed)food, exports and imports. Local food and school meals are new subjects that are increasingly being studied. Issues regarding agricultural supply and demand were the only subjects investigated in policy research studies. Interest in agricultural supply and demand was lost after the 2000s. A number of studies after the 2010s analyzed consumption, primarily consumption trends and consumer behavior.

Text-mining Techniques for Metabolic Pathway Reconstruction (대사경로 재구축을 위한 텍스트 마이닝 기법)

  • Kwon, Hyuk-Ryul;Na, Jong-Hwa;Yoo, Jae-Soo;Cho, Wan-Sup
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.12 no.4
    • /
    • pp.138-147
    • /
    • 2007
  • Metabolic pathway is a series of chemical reactions occuning within a cell and can be used for drug development and understanding of life phenomenon. Many biologists are trying to extract metabolic pathway information from huge literatures for their metabolic-circuit regulation study. We propose a text-mining technique based on the keyword and pattern. Proposed technique utilizes a web robot to collect huge papers and stores them into a local database. We use gene ontology to increase compound recognition rate and NCBI Tokenizer library to recognize useful information without compound destruction. Furthermore, we obtain useful sentence patterns representing metabolic pathway from papers and KEGG database. We have extracted 66 patterns in 20,000 documents for Glycosphingolipid species from KEGG, a representative metabolic database. We verify our system for nineteen compounds in Glycosphingolipid species. The result shows that the recall is 95.1%, the precision 96.3%, and the processing time 15 seconds. Proposed text mining system is expected to be used for metabolic pathway reconstruction.

  • PDF

Predicting Success of Government Policy in the Future with Futures Wheel and Text Mining : Predicting the Future Policy of Wage Peak System (텍스트 마이닝과 퓨쳐스 휠 기법을 활용한 정부정책의 미래 성공 예측 : 임금피크제의 미래 정책예측)

  • Kim, Hyong-Jung;Kim, Jin-Hwa
    • Journal of Digital Convergence
    • /
    • v.14 no.12
    • /
    • pp.141-153
    • /
    • 2016
  • The purpose of this study is to predict future of wage-peak system by using text mining, futures wheel and polarity voting (+, -) techniques after reviewing a variety of documents. For this study, we collected articles, news articles, SNS(Twitter, Blog), research report documents. Above all, we extracted keywords for main subject words by utilizing text mining techniques. Next, we drew a final conclusion about future of wage-peak system by using futures wheel and polarity voting techniques. The result showed that future of wage peak system is positive. Two of five main topics were negatively predicted (favor/oppose of wage-peak system, solving task of wage-peak system), however, three of five main topics were positively predicted (background of wage-peak system, purpose/reason of wage-peak system, alternative wage-peak system). Therefore, because three of the five main topics were positively predicted, the future for wage-peak system is positive.