• Title/Summary/Keyword: Text analysis

Search Result 3,381, Processing Time 3.638 seconds

Establishment of ITS Policy Issues Investigation Method in the Road Section applied Textmining (텍스트마이닝을 활용한 도로분야 ITS 정책이슈 탐색기법 정립)

  • Oh, Chang-Seok;Lee, Yong-taeck;Ko, Minsu
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.15 no.6
    • /
    • pp.10-23
    • /
    • 2016
  • With requiring circumspections using big data, this study attempts to develop and apply the search method for audit issues relating to the ITS policy or program. For the foregoing, the auditing process of the board of audit and inspection was converged with the theoretical frame of boundary analysis proposed by William Dunn as an analysis tool for audit issues. Moreover, we apply the text mining technique in order to computerize the analysis tool, which is similar to the boundary analysis in the concept of approaching meta-problems. For the text mining analysis, specific model we applied the antisymmetry-symmetry compound lexeme-based LDA model based on the Latent Dirichlet Allocation(LDA) methodologies proposed by David Blei. The several prime issues were founded through a case analysis as follows: lack of collection of traffic information by the urban traffic information system, which is operated by the National Police Agency, the overlapping problems between the Ministry of Land, Infrastructure and Transport and the Advanced Traffic Management System and fabrication of the mileage on digital tachograph.

An Analysis of IT Proposal Evaluation Results using Big Data-based Opinion Mining (빅데이터 분석 기반의 오피니언 마이닝을 이용한 정보화 사업 평가 분석)

  • Kim, Hong Sam;Kim, Chong Su
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.41 no.1
    • /
    • pp.1-10
    • /
    • 2018
  • Current evaluation practices for IT projects suffer from several problems, which include the difficulty of self-explanation for the evaluation results and the improperly scaled scoring system. This study aims to develop a methodology of opinion mining to extract key factors for the causal relationship analysis and to assess the feasibility of quantifying evaluation scores from text comments using opinion mining based on big data analysis. The research has been performed on the domain of publicly procured IT proposal evaluations, which are managed by the National Procurement Service. Around 10,000 sets of comments and evaluation scores have been gathered, most of which are in the form of digital data but some in paper documents. Thus, more refined form of text has been prepared using various tools. From them, keywords for factors and polarity indicators have been extracted, and experts on this domain have selected some of them as the key factors and indicators. Also, those keywords have been grouped into into dimensions. Causal relationship between keyword or dimension factors and evaluation scores were analyzed based on the two research models-a keyword-based model and a dimension-based model, using the correlation analysis and the regression analysis. The results show that keyword factors such as planning, strategy, technology and PM mostly affects the evaluation result and that the keywords are more appropriate forms of factors for causal relationship analysis than the dimensions. Also, it can be asserted from the analysis that evaluation scores can be composed or calculated from the unstructured text comments using opinion mining, when a comprehensive dictionary of polarity for Korean language can be provided. This study may contribute to the area of big data-based evaluation methodology and opinion mining for IT proposal evaluation, leading to a more reliable and effective IT proposal evaluation method.

An Investigation of a Sensibility Evaluation Method Using Big Data in the Field of Design -Focusing on Hanbok Related Design Factors, Sensibility Responses, and Evaluation Terms- (디자인 분야에서 빅데이터를 활용한 감성평가방법 모색 -한복 연관 디자인 요소, 감성적 반응, 평가어휘를 중심으로-)

  • An, Hyosun;Lee, Inseong
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.40 no.6
    • /
    • pp.1034-1044
    • /
    • 2016
  • This study seeks a method to objectively evaluate sensibility based on Big Data in the field of design. In order to do so, this study examined the sensibility responses on design factors for the public through a network analysis of texts displayed in social media. 'Hanbok', a formal clothing that represents Korea, was selected as the subject for the research methodology. We then collected 47,677 keywords related to Hanbok from 12,000 posts on Naver blogs from January $1^{st}$ to December $31^{st}$ 2015 and that analyzed using social matrix (a Big Data analysis software) rather than using previous survey methods. We also derived 56 key-words related to design elements and sensibility responses of Hanbok. Centrality analysis and CONCOR analysis were conducted using Ucinet6. The visualization of the network text analysis allowed the categorization of the main design factors of Hanbok with evaluation terms that mean positive, negative, and neutral sensibility responses. We also derived key evaluation factors for Hanbok as fitting, rationality, trend, and uniqueness. The evaluation terms extracted based on natural language processing technologies of atypical data have validity as a scale for evaluation and are expected to be suitable for utilization in an index for sensibility evaluation that supplements the limits of previous surveys and statistical analysis methods. The network text analysis method used in this study provides new guidelines for the use of Big Data involving sensibility evaluation methods in the field of design.

The Study on the patient safety culture convergence research topics through text mining and CONCOR analysis (텍스트마이닝 및 CONCOR 분석을 활용한 환자안전문화 융복합 연구주제 분석)

  • Baek, Su Mi;Moon, Inn Oh
    • Journal of Digital Convergence
    • /
    • v.19 no.12
    • /
    • pp.359-367
    • /
    • 2021
  • The purpose of this study is to analyze domestic patient safety culture research topics using text mining and CONCOR analysis. The research method was conducted in the stages of data collection, data preprocessing, text mining and social network analysis, and CONCOR analysis. A total of 136 articles were analyzed excluding papers that were not published. Data analysis was performed using Textom and UCINET programs. As a result of this study, TF (frequency) of patient safety culture-related studies showed that patient safety was the highest, and TF-IDF (importance in documents) was highest in nursing. As a result of the CONCOR analysis, a total of seven clusters were derived: knowledge and attitude, communication, medical service, team, work environment, structure, organization and management that constitute the patient safety culture. In the future, it is necessary to conduct research on the relationship between the establishment of a patient safety culture and patient outcomes.

Analysis of CSR·CSV·ESG Research Trends - Based on Big Data Analysis - (CSR·CSV·ESG 연구 동향 분석 - 빅데이터 분석을 중심으로 -)

  • Lee, Eun Ji;Moon, Jaeyoung
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.4
    • /
    • pp.751-776
    • /
    • 2022
  • Purpose: The purpose of this paper is to present implications by analyzing research trends on CSR, CSV and ESG by text analysis and visual analysis(Comprehensive/ Fields / Years-based) which are big data analyses, by collecting data based on previous studies on CSR, CSV and ESG. Methods: For the collection of analysis data, deep learning was used in the integrated search on the Academic Research Information Service (www.riss.kr) to search for "CSR", "CSV" and "ESG" as search terms, and the Korean abstracts and keyword were scrapped out of the extracted paper and they are organize into EXCEL. For the final step, CSR 2,847 papers, CSV 395 papers, ESG 555 papers derived were analyzed using the Rx64 4.0.2 program and Rstudio using text mining, one of the big data analysis techniques, and Word Cloud for visualization. Results: The results of this study are as follows; CSR, CSV, and ESG studies showed that research slowed down somewhat before 2010, but research increased rapidly until recently in 2019. Research have been found to be heavily researched in the fields of social science, art and physical education, and engineering. As a result of the study, there were many keyword of 'corporate', 'social', and 'responsibility', which were similar in the word cloud analysis. Looking at the frequent keyword and word cloud analysis by field and year, overall keyword were derived similar to all keyword by year. However, some differences appeared in each field. Conclusion: Government support and expert support for CSR, CSV and ESG should be activated, and researches on technology-based strategies are needed. In the future, it is necessary to take various approaches to them. If researches are conducted in consideration of the environment or energy, it is judged that bigger implications can be presented.

A Study on the Network Text Analysis about Oral Health in Aging-Well

  • Seol-Hee Kim
    • Journal of dental hygiene science
    • /
    • v.23 no.4
    • /
    • pp.302-311
    • /
    • 2023
  • Background: Oral health is an important element of well aging. And oral health also affects overall health, mental health, and quality of life. In this study, we sought to identify oral health influencing factors and research trends for well-aging through text analysis of research on well-aging and oral health over the past 12 years. Methods: The research data was analyzed based on English literature published in PubMed from 2012 to 2023. Aging well and oral health were used as search terms, and 115 final papers were selected. Network text analysis included keyword frequency analysis, centrality analysis, and cohesion structure analysis using the Net-Miner 4.0 program. Results: Excluding general characteristics, the most frequent keywords in 115 articles, 520 keywords (Mesh terms) were psychology, dental prosthesis and Alzheimer's disease, Dental caries, cognition, cognitive dysfunction, and bacteria. Research keywords with high degree centrality were Dental caries (0.864), Quality of life (0.833), Tooth loss (0.818), Health status (0.727), and Life expectancy (0.712). As a result of community analysis, it consisted of 4 groups. Group 1 consisted of chewing and nutrition, Group 2 consisted oral diseases, systemic diseases and management, Group 3 consisted oral health and mental health, Group 4 consisted oral frailty symptoms and quality of life. Conclusion: In an aging society, oral dysfunction affects mental health and quality of life. Preventing oral diseases for well-aging can have a positive impact on mental health and quality of life. Therefore, efforts are needed to prevent oral frailty in a super-aging society by developing and educating systematic oral care programs for each life cycle.

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

A Study on the Improvement of Retrieval Efficiency Based on the CRFMD (공통기술표현포맷에 기반한 다매체자료의 검색효율 향상에 관한 연구)

  • Park, Il-Jong;Jeong, Ki-Tai
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.3 s.61
    • /
    • pp.5-21
    • /
    • 2006
  • In recent years, theories of image and sound analysis have been proposed to work with text retrieval systems and have progressed quickly with the rapid progress in data processing speeds. This study proposes a common representation format for multimedia documents (CRFMD) composed of both images and text to form a single data structure. It also shows that image classification of a given test set is dramatically improved when text features are encoded together with image features. CRFMD might be applicable to other areas of multimedia document retrieval and processing, such as medical image retrieval, World Wide Web searching, and museum collection retrieval.

A comparative study of Entity-Grid and LSA models on Korean sentence ordering (한국어 텍스트 문장정렬을 위한 개체격자 접근법과 LSA 기반 접근법의 활용연구)

  • Kim, Youngsam;Kim, Hong-Gee;Shin, Hyopil
    • Korean Journal of Cognitive Science
    • /
    • v.24 no.4
    • /
    • pp.301-321
    • /
    • 2013
  • For the task of sentence ordering, this paper attempts to utilize the Entity-Grid model, a type of entity-based modeling approach, as well as Latent Semantic analysis, which is based on vector space modeling, The task is well known as one of the fundamental tools used to measure text coherence and to enhance text generation processes. For the implementation of the Entity-Grid model, we attempt to use the syntactic roles of the nouns in the Korean text for the ordering task, and measure its impact on the result, since its contribution has been discussed in previous research. Contrary to the case of German, it shows a positive result. In order to obtain the information on the syntactic roles, we use a strategy of using Korean case-markers for the nouns. As a result, it is revealed that the cues can be helpful to measure text coherence. In addition, we compare the results with the ones of the LSA-based model, discussing the advantages and disadvantages of the models, and options for future studies.

  • PDF

Current Research Trends and Present Conditions on Visual Transformation of Digital Text (디지털텍스트의 시각적 변형에 관한 연구 동향 및 실태 분석)

  • Jin, Sung-Hee
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.1
    • /
    • pp.486-497
    • /
    • 2010
  • The purpose of this study is to investigate the research trends and the present conditions of real digital texts on "Visual Transformation." For the purpose of this study adopted two different methods: meta analysis and case study. The research trends on visual transformation of digital text were investigated through analyzing the total of 167 literature by means of synthetic meta analysis. Relevant literature was categorized into three types of research: functional, dynamic, and interactional transformation. The type of literature and research methods in each literature were analyzed. The present conditions of real digital texts on visual transformation were investigated by means of case study. The well designed 12 e-learning contents selected and analyzed in terms of the analysis framework which was drawn by the research trends. The results suggested problems as follows in designing e-learning contents. Firstly, there were some cases that did not follow the basic design principles related to typography. Secondly, the content was just provided in each learning steps without consideration of design to enhance text comprehension in many cases. Thirdly, web technology adequately was not applied to design e-learning contents.