• Title/Summary/Keyword: 텍스트 출현 빈도

Search Result 102, Processing Time 0.03 seconds

Analysis on the Characteristics of Construction Practice Information Using Text Mining: Focusing on Information Such as Construction Technology, Cases, and Cost Reduction (텍스트마이닝을 활용한 건설실무정보의 특성 분석 - 건설기술, 사례, 원가절감 등 정보를 중심으로 -)

  • Seong-Yun, Jeong;Jin-Uk, Kim
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.56 no.4
    • /
    • pp.205-222
    • /
    • 2022
  • This study aims to improve the information service so that construction engineers and construction project participants without specialized knowledge can easily understand the important words and the interrelationships between them in construction practice. To this end, using text mining and network centrality, the frequency of occurrence of words, topic modeling, and network centrality in construction practice information such as technical information, case information, and cost reduction, which are most used in the Construction Technology Digital Library, were analyzed. Through this analysis, design, construction, project management, specifications, standards, and maintenance related to road construction such as roads, pavements, bridges, and tunnels were identified as important in construction practice. In addition, correlations were analyzed for words with high importance by measuring Degree Centrality and Eigenvector Centrality. The result was that more useful information could be provided if the technical information was expanded. Finally, we presented the limitations of the study results and additional studies according to the limitations.

Analysis of Work-Related Musculoskeletal Disorders Research Trends Using Keyword Frequency Analysis and CONCOR Technique

  • Geon-Hui Lee;Seo-Yeon Choi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.8
    • /
    • pp.137-144
    • /
    • 2023
  • One of the methods being suggested as a way to address social issues is the utilization of big data analysis techniques. In this study, we utilized keyword network analysis and CONCOR analysis techniques to analyze the research trends on work-related musculoskeletal disorders. The findings of this study are as follows: Firstly, the number of papers on work-related musculoskeletal disorders has been consistently increasing, with an average of over 33 articles published per year since the investigation of musculoskeletal risk factors in 2003. The publication rate showed an increase from 2007 to 2009. Secondly, the frequency of the top keywords identified through text mining were as follows: work (4,940), musculoskeletal disorders (2,197), symptoms (1,836), related (1,769), musculoskeletal system (1,421). Thirdly, the CONCOR analysis resulted in the formation of four clusters: ' Musculoskeletal disorder treatment', 'Occupational health and safety management', 'Work environment assessment', and ' Workplace environment measurement'. It is expected that this study will contribute to the development of research on musculoskeletal disorders and provide various directions for future studies.

A Topic Related Word Extraction Method Using Deep Learning Based News Analysis (딥러닝 기반의 뉴스 분석을 활용한 주제별 최신 연관단어 추출 기법)

  • Kim, Sung-Jin;Kim, Gun-Woo;Lee, Dong-Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.04a
    • /
    • pp.873-876
    • /
    • 2017
  • 최근 정보검색의 효율성을 위해 데이터를 분석하여 해당 데이터를 가장 잘 나타내는 연관단어를 추출 및 추천하는 연구가 활발히 이루어지고 있다. 현재 관련 연구들은 출현 빈도수를 사용하는 방법이나 LDA와 같은 기계학습 기법을 활용해 데이터를 분석하여 연관단어를 생성하는 방법을 제안하고 있다. 기계학습 기법은 결과 값을 찾는데 사용되는 특징들을 전문가가 직접 설계해야 하며 좋은 결과를 내는 적절한 특징을 찾을 때까지 많은 시간이 필요하다. 또한, 파라미터들을 직접 설정해야 하므로 많은 시간과 노력을 필요로 한다는 단점을 지닌다. 이러한 기계학습 기법의 단점을 극복하기 위해 인공신경망을 다층구조로 배치하여 데이터를 분석하는 딥러닝이 최근 각광받고 있다. 본 논문에서는 기존 기계학습 기법을 사용하는 연관단어 추출연구의 한계점을 극복하기 위해 딥러닝을 활용한다. 먼저, 인공신경망 기반 단어 벡터 생성기인 Word2Vec를 사용하여 다양한 텍스트 데이터들을 학습하고 룩업 테이블을 생성한다. 그 후, 생성된 룩업 테이블을 바탕으로 인공신경망의 한 종류인 합성곱 신경망을 활용하여 사용자가 입력한 주제어와 관련된 최근 뉴스데이터를 분석한 후, 주제별 최신 연관단어를 추출하는 시스템을 제안한다. 또한 제안한 시스템을 통해 생성된 연관단어의 정확률을 측정하여 성능을 평가하였다.

Semi-automatic Event Structure Frame tagging of WordNet Synset (워드넷 신셋에 대한 사건구조 프레임 반자동 태깅)

  • Im, Seohyun
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.101-105
    • /
    • 2018
  • 이 논문은 가장 잘 알려진 어휘부중 하나인 워드넷의 활용 범위 확장을 위해 워드넷 신셋에 "사건구조 프레임(Event Structure Frame)"을 주석하는 연구에 관한 것이다. 워드넷을 비롯하여 현재 사용되고 있는 어휘부는 풍부한 어휘의미정보가 구조화되어 있지만, 사건구조에 관한 정보를 포함하고 있지는 않다. 이 연구의 가장 큰 기여는 워드넷에 사건구조 프레임을 추가함으로써 워드넷과의 연결만으로 핵심적인 어휘의미정보를 모두 추출할 수 있도록 해준다는 점이다. 예를 들어 텍스트 추론, 자연어처리, 멀티 모달 태스크 등은 어휘의미정보와 배경지식(상식)을 이용하여 태스크를 수행한다. 워드넷에 대한 사건구조 주석은 자동사건구조 주석 시스템인 GESL을 이용하여 워드넷 신셋에 있는 예문에 먼저 자동 주석을 하고, 오류에 대해 수동 수정을 하는 반자동 방식이다. 사전 정의된 23개의 사건구조 프레임에 따라 예문에 출현하는 타겟 동사를 분류하고, 해당 프레임과 매핑한다. 현재 이 연구는 시작 단계이며, 이 논문에서는 빈도 순위가 가장 높은 100개의 동사와 각 사건구조 프레임별 대표 동사를 포함하여 총 106개의 동사 레마에 대해 실험을 진행하였다. 그 동사들에 대한 전체 워드넷 신셋의 수는 1337개이다. 예문이 없어서 GESL이 적용될 수 없는 신셋을 제외하면 1112개 신셋이다. 이 신셋들에 대해 GESL을 적용한 결과 F-Measure는 73.5%이다. 향후 연구에서는 워드넷-사건구조 링크를 계속 업데이트하면서 딥러닝을 이용해 GESL 성능을 향상 할 수 있는 방법을 모색할 것이다.

  • PDF

An Experimental Study on Feature Selection Using Wikipedia for Text Categorization (위키피디아를 이용한 분류자질 선정에 관한 연구)

  • Kim, Yong-Hwan;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.155-171
    • /
    • 2012
  • In text categorization, core terms of an input document are hardly selected as classification features if they do not occur in a training document set. Besides, synonymous terms with the same concept are usually treated as different features. This study aims to improve text categorization performance by integrating synonyms into a single feature and by replacing input terms not in the training document set with the most similar term occurring in training documents using Wikipedia. For the selection of classification features, experiments were performed in various settings composed of three different conditions: the use of category information of non-training terms, the part of Wikipedia used for measuring term-term similarity, and the type of similarity measures. The categorization performance of a kNN classifier was improved by 0.35~1.85% in $F_1$ value in all the experimental settings when non-learning terms were replaced by the learning term with the highest similarity above the threshold value. Although the improvement ratio is not as high as expected, several semantic as well as structural devices of Wikipedia could be used for selecting more effective classification features.

A Study on the Research Trends of 『Journal of Elementary Mathematics Education in Korea』 through a Keyword Network Analysis (키워드 네트워크 분석을 통한 『한국초등수학교육학회지』 연구의 동향 분석)

  • Moon, So Young;Cho, Jinseok
    • Journal of Elementary Mathematics Education in Korea
    • /
    • v.23 no.4
    • /
    • pp.459-479
    • /
    • 2019
  • The purpose of this study is to explore the research trends and knowledge structures of 『Journal of Elementary Mathematics Education in Korea』 by applying the keyword network analysis. To do this, we analyzed the frequency of the occurrence of keywords in the journal and conducted keyword network analysis using the Krkwic program and NodeXL program. The results of the analysis are as follows. Firstly, 749 keywords were extracted from keyword cleansing process and 48 keywords, including mathematics curriculum, mathematics textbooks, school mathematics, mathematical problem solving, mathematically gifted student, etc. appeared more than five times. Secondly, the keyword network analysis showed that the keywords-mathematics textbooks, school mathematics, mathematical problem solving, mathematical communications-have high connection centrality. Finally, we provided the limitations of this study and suggested future research.

  • PDF

The Analysis of Fashion Trend Cycle using Big Data (패션 트렌드의 주기적 순환성에 관한 빅데이터 융합 분석)

  • Kim, Ki-Hyun;Byun, Hae-Won
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.12
    • /
    • pp.113-123
    • /
    • 2020
  • In this paper, big data analysis was conducted for past and present fashion trends and fashion cycle. We focused on daily look for ordinary people instead of the fashion professionals and fashion show. Using the social matrix tool, Textom, we performed frequency analysis, N-gram analysis, network analysis and structural equivalence analysis on the big data containing fashion trends and cycles. The results are as follows. First, this study extracted the major key words related to fashion trends for the daily look from the past(1980s, 1990s) and the present(2019 and 2020). Second, the frequence analysis and N-gram analysis showed that the fashion cycle has shorten to 30-40 years. Third, the structural equivalence analysis found the four representative clusters. The past four clusters are jean, retro codi, athleisure look, celebrity retro and the present clusters are retro, newtro, lady chic, retro futurism. Fourth, through the network analysis and N-gram analysis, it turned out that the past fashion is reproduced and evolves to the current fashion with certain reasoning.

Global Citizenship Education in the Primary Geography Curriculum of the Republic of Korea: Content Analysis Focusing on the Semantic Structure of 2009 Revised School Curriculum (초등지리 교육과정에 반영된 세계시민교육 관련 요소의 구조적 특성에 관한 연구: 2009 개정 교육과정 성취기준에 대한 내용분석을 중심으로)

  • Lee, Dong-Min
    • Journal of the Korean Geographical Society
    • /
    • v.49 no.6
    • /
    • pp.949-969
    • /
    • 2014
  • The purpose of this study is to analyze the share of global citizenship education in the 2009 Revised Social Studies (geography area) School Curriculum of the Republic of Korea. I selected the achievement standards of the geography domain in the fifth and sixth grades as the subjects of analysis. The chosen subjects were examined using content analysis: I used KrKwic, a Korean language content analysis tool, to analyze the content and drew a semantic network of the analysis results using UciNet/NetDraw. I found that the geography domain of the 2009 Revised Primary School Curriculum included the concepts of and factors of global citizenship education. However, global citizenship education did not account for a major portion of the curriculum, and the curriculum achievement standards were noticeably nation-state centered. Global citizenship education factors were not closely associated with to other related factors in fact, they even revealed a isolated pattern. These findings suggest that the inclusion of global citizenship education in primary geography education is limited, because the connections between global citizenship education and related contents, such as the environment, sustainable development, conflict, and cooperation, are probably impeded. Globalization accompanies the transformation of territories, identities, and the relations between nation-states and the world, although nation-states continue to play a significant role in the globalized worlds. Therefore global citizenship education, a educational trend focusing on the global community, is particularly important and is required in the geography curriculum of the global era. I expect that the examination undertaken in this study to contribute to future curriculum revisions regarding globalizatin and global citizenship.

  • PDF

A Study on the Perception of Pit and Fissure Sealant using Unstructured Big Data (비정형 빅데이터를 이용한 치면열구전색(치아홈메우기)에 대한 인식분석)

  • Han-A Cho
    • Journal of Korean Dental Hygiene Science
    • /
    • v.6 no.2
    • /
    • pp.101-114
    • /
    • 2023
  • Background: This study aimed to explore the overall perception of pit and fissure sealants and suggest methods to revitalize their current stagnation. Methods: To determine the social perception of the change in coverage policy for pit and fissure sealants, we categorized them into five time periods. The first period (December 1, 2009 to November 30, 2010), the second period (December 1, 2010 to September 30, 2012), the third period (October 1, 2012 to May 5, 2013), the fourth period (May 6, 2013 to September 30, 2017), and the fifth period (October 1, 2017 to December 31, 2022). We utilized text mining, an unstructured big data analysis method. Keywords were collected and analyzed using Textom, and the frequency analysis of the top 30 keywords, structural features of the semantic network, centrality analysis, QAP correlation analysis, and co-occurrence analysis were conducted. Results: The frequency analysis showed that the top keywords for each time period were 'Cavities', 'Treatment', and 'Children'. In the structural features of the semantic network of pit and fissure sealants by time period, the density index was found to be around 1.00 for all time periods. The QAP correlation analysis showed the highest correlation between the first and second periods and the fourth and fifth periods with a correlation coefficient of 0.834. The co-occurrence analysis showed that 'cavities' and 'prevention were the top two words across all time periods. Conclusion: This study showed that pit and fissure sealants are well accepted by the society as a preventive treatment for caries. However, the awareness of health education related to these sealants was found to be low. Efforts to revitalize stagnant pit and fissure sealants need to be strengthened with effective education.

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.