• Title/Summary/Keyword: 텍스트 출현 빈도

Search Result 102, Processing Time 0.024 seconds

A Study on the Semantic Network Structure of the Regime in the Image Contents (영상콘텐츠분야의 정권별 의미연결망 연구)

  • Hwang, Go-Eun;Moon, Shin-Jung
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.28 no.3
    • /
    • pp.217-240
    • /
    • 2017
  • The purpose of this study was to investigate the semantic network analysis to understand image contents and to examine the degree to which words, word clusters contributed to the formation of semantic map within image contents. For this research, from 1993 until 2016 the field of the image contents were collected for a total of 2,624 cases papers. The word appeared in Title analyzed the social network by using the R program of Big Data. The results were as follows: First, The field of image contents is based on researches related to 'image', 'media' and 'contents'. Second, there is a three-step flow ('education' -> 'media' -> 'contents') of research in the field of image contents. Third, researches related to 'broadcasting', 'digital', 'technology', and 'production' were continuously carried out. Finally, There were new research subjects for each regime.

Extracting Technical Vocabulary List for Early Childhood Education Using EAP Specialized Corpus (EAP 전문 코퍼스를 활용한 유아교육 전문 어휘 추출)

  • Lee, Je-Young;Ahn, Jongki;Lee, Jee Eun
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.1
    • /
    • pp.475-484
    • /
    • 2017
  • The aim of this research is the development and evaluation of a technical vocabulary list for early childhood education. The list was compiled from a corpus of 500,000 running words of written academic texts from 7 books about early childhood education. The distribution of GSL[1] and AWL[2] was 81.86% and 9.78% respectively, which meant that academic texts related to early childhood education is very similar with ones on other disciplines. The technical vocabulary list for early childhood education (TV4ECE), extracted in terms of frequency and range, contains 224 types. This word list can be used to teach early childhood education in English, especially for the preparation of reading the English texts in the field of early childhood education.

Collection and Extraction Algorithm of Field-Associated Terms (분야연상어의 수집과 추출 알고리즘)

  • Lee, Sang-Kon;Lee, Wan-Kwon
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.347-358
    • /
    • 2003
  • VSField-associated term is a single or compound word whose terms occur in any document, and which makes it possible to recognize a field of text by using common knowledge of human. For example, human recognizes the field of document such as or , a field name of text, when she encounters a word 'Pitcher' or 'election', respectively We Proposes an efficient construction method of field-associated terms (FTs) for specializing field to decide a field of text. We could fix document classification scheme from well-classified document database or corpus. Considering focus field we discuss levels and stability ranks of field-associated terms. To construct a balanced FT collection, we construct a single FTs. From the collections we could automatically construct FT's levels, and stability ranks. We propose a new extraction algorithms of FT's for document classification by using FT's concentration rate, its occurrence frequencies.

Concept Extraction Technique from Documents Using Domain Ontology (지식 문서에서 도메인 온톨로지를 이용한 개념 추출 기법)

  • Mun Hyeon-Jeong;Woo Yong-Tae
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.309-316
    • /
    • 2006
  • We propose a novel technique to categorize XML documents and extract a concept efficiently using domain ontology. First, we create domain ontology that use text mining technique and statistical technique. We propose a DScore technique to classify XML documents by using the structural characteristic of XML document. We also present TScore technique to extract a concept by comparing the association term set of domain ontology and the terms in the XML document. To verify the efficiency of the proposed technique, we perform experiment for 295 papers in the computer science area. The results of experiment show that the proposed technique using the structural information in the XML documents is more efficient than the existing technique. Especially, the TScore technique effectively extract the concept of documents although frequency of term is few. Hence, the proposed concept-based retrieval techniques can be expected to contribute to the development of an efficient ontology-based knowledge management system.

Analysis of drama viewership related words through unstructured data collection (비정형데이터 수집을 통한 드라마 시청률 연관어 분석)

  • Kang, Sun-Kyoung;Lee, Hyun-Chang;Shin, Seong-Yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.8
    • /
    • pp.1567-1574
    • /
    • 2017
  • In this paper, we analyzed the stereotyped and non - stereotyped data in order to analyze the drama 's ratings. The formalized data collection collected 19 items from the four areas of drama information, person information, broadcasting information, and audience rating information of each broadcasting company. Atypical data were collected from bulletin boards, pre - broadcast blogs and post - broadcast blogs operated by each broadcasting company using a crawling technique. As a result of comparing the differences according to the four areas for each broadcaster from the collected regular data, the results were similar to each other. And we derived seven related words by analyzing the correlation of occurrence frequencies from unstructured data collected from bulletin boards and blogs of each broadcasting company. The derived associations were obtained through reliability analysis.

A Study on Domestic Research Trends (2001-2020) of Forest Ecology Using Text Mining (텍스트마이닝을 활용한 국내 산림생태 분야 연구동향(2001-2020) 분석)

  • Lee, Jinkyu;Lee, Chang-Bae
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.3
    • /
    • pp.308-321
    • /
    • 2021
  • The purpose of this study was to analyze domestic research trends over the past 20 years and future direction of forest ecology using text mining. A total of 1,015 academic papers and keywords data related to forest ecology were collected by the "Research and Information Service Section" and analyzed using big data analysis programs, such as Textom and UCINET. From the results of word frequency and N-gram analyses, we found domestic studies on forest ecology rapidly increased since 2011. The most common research topic was "species diversity" over the past 20 years and "climate change" became a major topic since 2011. Based on CONCOR analysis, study subjects were grouped intoeight categories, such as "species diversity," "environmental policy," "climate change," "management," "plant taxonomy," "habitat suitability index," "vascular plants," and "recreation and welfare." Consequently, species diversity and climate change will remain important topics in the future and diversifying and expanding domestic research topics following global research trendsis necessary.

Knowledge Trend Analysis of Uncertainty in Biomedical Scientific Literature (생의학 학술 문헌의 불확실성 기반 지식 동향 분석에 관한 연구)

  • Heo, Go Eun;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.2
    • /
    • pp.175-199
    • /
    • 2019
  • Uncertainty means incomplete stages of knowledge of propositions due to the lack of consensus of information and existing knowledge. As the amount of academic literature increases exponentially over time, new knowledge is discovered as research develops. Although the flow of time may be an important factor to identify patterns of uncertainty in scientific knowledge, existing studies have only identified the nature of uncertainty based on the frequency in a particular discipline, and they did not take into consideration of the flow of time. Therefore, in this study, we identify and analyze the uncertainty words that indicate uncertainty in the scientific literature and investigate the stream of knowledge. We examine the pattern of biomedical knowledge such as representative entity pairs, predicate types, and entities over time. We also perform the significance testing using linear regression analysis. Seven pairs out of 17 entity pairs show the significant decrease pattern statistically and all 10 representative predicates decrease significantly over time. We analyze the relative importance of representative entities by year and identify entities that display a significant rising and falling pattern.

A Study on the Statistical Characteristics for Table of Contents Text of the Books in Social Sciences Field (사회과학 분야 도서의 목차 텍스트에 대한 통계적 특성에 관한 연구)

  • Lee, Yong-Gu
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.2
    • /
    • pp.255-273
    • /
    • 2019
  • Recently, the table of contents (TOC) has been becoming increasingly accessible and utilized. The study conducted descriptive statistics and comparative analysis of the table of contents in terms of parts of speech and subject in text. For this purpose, this study chose the books of the social sciences field from acquisition lists of an academic library, obtained Dewey class numbers of target books from KERIS union catalog, and extracted TOC data from online bookstore. Morphological analysis was performed on each book titles and TOCs, and descriptive statistics and frequency analysis were carried out. As a result, nouns made up roughly half of the morphemes of titles or the TOCs. TOCs had about 50 times more nouns than titles. The percentage of unique nouns that appeared only in the table of contents is estimated to be 95.2% of the TOC's total nouns. The table of contents also showed a differences in its lengths depending on the field of social science.

The Stream of Uncertainty in Scientific Knowledge using Topic Modeling (토픽 모델링 기반 과학적 지식의 불확실성의 흐름에 관한 연구)

  • Heo, Go Eun
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.1
    • /
    • pp.191-213
    • /
    • 2019
  • The process of obtaining scientific knowledge is conducted through research. Researchers deal with the uncertainty of science and establish certainty of scientific knowledge. In other words, in order to obtain scientific knowledge, uncertainty is an essential step that must be performed. The existing studies were predominantly performed through a hedging study of linguistic approaches and constructed corpus with uncertainty word manually in computational linguistics. They have only been able to identify characteristics of uncertainty in a particular research field based on the simple frequency. Therefore, in this study, we examine pattern of scientific knowledge based on uncertainty word according to the passage of time in biomedical literature where biomedical claims in sentences play an important role. For this purpose, biomedical propositions are analyzed based on semantic predications provided by UMLS and DMR topic modeling which is useful method to identify patterns in disciplines is applied to understand the trend of entity based topic with uncertainty. As time goes by, the development of research has been confirmed that uncertainty in scientific knowledge is moving toward a decreasing pattern.

An Analysis of Domestic Newspaper Articles on 5.18 using the Bigkinds System (빅카인즈를 활용한 5·18 관련 국내 기사 분석 연구)

  • Juhyeon Park;Hyunji Park;Youngbum Gim
    • Journal of the Korean Society for information Management
    • /
    • v.41 no.1
    • /
    • pp.107-132
    • /
    • 2024
  • This study attempted to analyze newspaper articles related to May 18 through frequency analysis and network analysis using news data related to May 18 for about 30 years from 1990 to 2022 at the Korea Press Foundation's Big Kinds. Specifically, quantitative change trends were examined by analyzing the amount of articles by period and region, and the connection structure between major keywords by the regime was explored through network analysis by regime using co-appearance keywords. As a result of the analysis, it was found that 2019 had the largest amount of coverage, which had many social issues in time, and the Jeolla-do region had the largest amount of coverage in the region. And as a result of network analysis, there were differences in words related to May 18 in news data according to the perception and policy of the regime toward May 18. As a result of synthesizing the analysis of May 18 news data, it was confirmed that May 18 was becoming a democratic movement over time regardless of region, but at the same time, the distortion of May 18 was not resolved.