• Title/Summary/Keyword: 텍스트 출현 빈도

Search Result 103, Processing Time 0.024 seconds

Research Trend Analysis on Living Lab Using Text Mining (텍스트 마이닝을 이용한 리빙랩 연구동향 분석)

  • Kim, SeongMook;Kim, YoungJun
    • Journal of Digital Convergence
    • /
    • v.18 no.8
    • /
    • pp.37-48
    • /
    • 2020
  • This study aimed at understanding trends of living lab studies and deriving implications for directions of the studies by utilizing text mining. The study included network analysis and topic modelling based on keywords and abstracts from total 166 thesis published between 2011 and November 2019. Centrality analysis showed that living lab studies had been conducted focusing on keywords like innovation, society, technology, development, user and so on. From the topic modelling, 5 topics such as "regional innovation and user support", "social policy program of government", "smart city platform building", "technology innovation model of company" and "participation in system transformation" were extracted. Since the foundation of KNoLL in 2017, the diversification of living lab study subjects has been made. Quantitative analysis using text mining provides useful results for development of living lab studies.

Clustering of Web Document Exploiting with the Co-link in Hypertext (동시링크를 이용한 웹 문서 클러스터링 실험)

  • 김영기;이원희;권혁철
    • Journal of Korean Library and Information Science Society
    • /
    • v.34 no.2
    • /
    • pp.233-253
    • /
    • 2003
  • Knowledge organization is the way we humans understand the world. There are two types of information organization mechanisms studied in information retrieval: namely classification md clustering. Classification organizes entities by pigeonholing them into predefined categories, whereas clustering organizes information by grouping similar or related entities together. The system of the Internet information resources extracts a keyword from the words which appear in the web document and draws up a reverse file. Term clustering based on grouping related terms, however, did not prove overly successful and was mostly abandoned in cases of documents used different languages each other or door-way-pages composed of only an anchor text. This study examines infometric analysis and clustering possibility of web documents based on co-link topology of web pages.

  • PDF

Development of big data based Skin Care Information System SCIS for skin condition diagnosis and management

  • Kim, Hyung-Hoon;Cho, Jeong-Ran
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.3
    • /
    • pp.137-147
    • /
    • 2022
  • Diagnosis and management of skin condition is a very basic and important function in performing its role for workers in the beauty industry and cosmetics industry. For accurate skin condition diagnosis and management, it is necessary to understand the skin condition and needs of customers. In this paper, we developed SCIS, a big data-based skin care information system that supports skin condition diagnosis and management using social media big data for skin condition diagnosis and management. By using the developed system, it is possible to analyze and extract core information for skin condition diagnosis and management based on text information. The skin care information system SCIS developed in this paper consists of big data collection stage, text preprocessing stage, image preprocessing stage, and text word analysis stage. SCIS collected big data necessary for skin diagnosis and management, and extracted key words and topics from text information through simple frequency analysis, relative frequency analysis, co-occurrence analysis, and correlation analysis of key words. In addition, by analyzing the extracted key words and information and performing various visualization processes such as scatter plot, NetworkX, t-SNE, and clustering, it can be used efficiently in diagnosing and managing skin conditions.

Analysis of Social Network According to The Distance of Characters Statements (소설 등장인물의 텍스트 거리를 이용한 사회 구성망 분석)

  • Park, Gyeong-Mi;Kim, Sung-Hwan;Cho, Hwan-Gue
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.4
    • /
    • pp.427-439
    • /
    • 2013
  • With the fast development of complex science, lots of social networks are studied. We know that the social network is widely applied in analyzing issues in human culture, economics and web sciences. Recently we witness that some researchers began to compare the social network constructed from fiction literatures(literature social network) and the real social network obtained from practice. But we point that previous approaches for literature social network have some drawbacks since they completely depend on the biographical dictionary constructed for a designated literature. So since the previous approach focus on the few important characters and peoples around them, we can not understand the global structure of all characters appeared in the literature at least once. We propose one method to extract all characters appeared in the literature and how to make the social network from that information. Also we newly propose K-critical network by applying frequency of the named characters and the strength of relationship among all textual characters. Our experiment shows that the K-critical measure could be one crucial quantitative measure to compute the relationship strength among characters appeared in the object literature.

A study on frame transition of personal information leakage, 1984-2014: social network analysis approach (사회연결망 분석을 활용한 개인정보 유출 프레임 변화에 관한 연구: 1984년-2014년을 중심으로)

  • Jeong, Seo Hwa;Cho, Hyun Suk
    • Journal of Digital Convergence
    • /
    • v.12 no.5
    • /
    • pp.57-68
    • /
    • 2014
  • This article analyses frame transition of personal information leakage in Korea from 1984 to 2014. In order to investigate the transition, we have collected newspaper article's titles. This study adopts classification, text network analysis(by co-occurrence symmetric matrix), and clustering techniques as part of social network analysis. Moreover, we apply definition of centrality in network in order to reveal the main frame formed in each of four periods. As a result, accessibility of personal information is extended from public sector to private sector. The boundary of personal information leakage is expanded to overseas. Therefore it is urgent to institutionalize the protection of personal information from a global perspective.

WCTT: Web Crawling System based on HTML Document Formalization (WCTT: HTML 문서 정형화 기반 웹 크롤링 시스템)

  • Kim, Jin-Hwan;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.4
    • /
    • pp.495-502
    • /
    • 2022
  • Web crawler, which is mainly used to collect text on the web today, is difficult to maintain and expand because researchers must implement different collection logic by collection channel after analyzing tags and styles of HTML documents. To solve this problem, the web crawler should be able to collect text by formalizing HTML documents to the same structure. In this paper, we designed and implemented WCTT(Web Crawling system based on Tag path and Text appearance frequency), a web crawling system that collects text with a single collection logic by formalizing HTML documents based on tag path and text appearance frequency. Because WCTT collects texts with the same logic for all collection channels, it is easy to maintain and expand the collection channel. In addition, it provides the preprocessing function that removes stopwords and extracts only nouns for keyword network analysis and so on.

말뭉치에 근거한 한국어 사전 표제어 구성

  • Park, Yeong-Hwan;Yun, Jun-Tae;Song, Man-Seok
    • Annual Conference on Human and Language Technology
    • /
    • 1991.10a
    • /
    • pp.58-65
    • /
    • 1991
  • 사전은 자연어를 처리하는 핵심 부분을 이루고 있다. 그러나 기존의 한국어 사전은 기계적인 처리에 직접 이용하기에는 크게 미흡하다. 특히, 사전의 기본을 이루는 표제어 수록에 관한 연구는 더욱 취약한 형편이다. 본 연구는 새로운 한국어 사전의 표제어률 구성하기 위하여 대형 말뭉치를 수집하였다. 이 말뭉치를 이용하여 기존 사전에서 빠져있는 미등록어들을 찾아내어 수록하고, 말뭉치에 나타난 각 단어의 출현 빈도를 조사하였다. 이 연구를 수행하기 위하여 형태소 분석기, 용례 분석기 등의 필수적인 텍스트 처리 도구들을 개발하였다. 또한, 말뭉치에 나타난 어절 단위의 오류 분포를 조사하여 밝히었다.

  • PDF

Design and Implementation for Extraction of Field-Associationed Terms (분야연상어 추출 방법의 설계 및 구현)

  • Lee, Won-Hee;Choi, Hyun;Lee, Samuel Sangkon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.651-654
    • /
    • 2004
  • 우리는 특정 문서를 읽을 때 문서 전체를 읽지 않더라도 대표적인 몇 개의 단어를 보는 것만으로 정치나 경제, 스포츠 등의 분야를 정확히 인지할 수 있다. 문서 전체를 대상으로 하지 않고 부분텍스트에서 출현하는 소수의 단어정보에서 문서의 분야를 정확히 결정하기 위해 분야연상어의 구축은 중요한 연구과제이다. 인간이 미리 분야체계를 정의하고, 각 분야에 해당하는 문서를 인터넷이나 서적을 통해 수집한다. 본 논문은 수집문서의 분야를 정확히 지시하는 분야연상어를 자동으로 수집하는 시스템을 설계하고 구현하는데 목적이 있다. 문서의 분야결정 시점을 고려하여 분야연상어의 수준, 안정성 랭크, 집중률, 빈도정보를 이용하여 단일 분야연상어를 수집하는 방법을 제안하고 구현한다.

  • PDF

Analyzing the Study Trends of 'Sense of Place' Using Text Mining Techniques (텍스트마이닝 기법을 활용한 국내외 장소성 관련 연구동향 분석)

  • Lee, Ina;Kim, Hea-Jin
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.30 no.2
    • /
    • pp.189-209
    • /
    • 2019
  • Main Path Analysis (MPA) is one of the text mining techniques that extracts the core literature that contributes knowledge transfer based on citation information in the literature. This study applied various text mining techniques to abstract of the paper related with sense-of-place, which is published at Korea and abroad from 1990 to 2018 so that could discuss in a macro perspective. The main path analysis results showed that from 1990, overseas research on sense-of-place has been carried out in the order of personal identity, public land management, environmental education and urban development-related areas. Also, by using the network analysis, this study found that sense-of-place was discussed at various levels in Korea, including urban development, culture, literature, and history. On the other hand, it has been found that there are few topic changes in international studies, and that discussions on health, identity, landscape and urban development have been going on steadily since the 1990s. This study has implications that it presents a new perspective of grasping the overall flow of relevant research.

Changes in mathematics pedagogical lexicons: Extension research of the International Classroom Lexicon using a text mining approach (수학 교수학적 어휘의 변화: 텍스트 마이닝 기법을 이용한 교실수업 어휘 연구의 확장)

  • Lee, Gima;Kim, Hee-jeong
    • The Mathematical Education
    • /
    • v.61 no.4
    • /
    • pp.559-579
    • /
    • 2022
  • Research on lexicon and language provides insights into the interests, values and practices of a community where individuals use the language. The International Classroom Lexicon Project, in which ten countries participated, identified own country's mathematics teaching and learning lexicons by investigating mathematics classroom instruction from teachers' perspectives in a speaking-oriented community. This study, as an extension of the International Classroom Lexicon Project research, investigated pedagogical lexicons used in 「Mathematics and Education」 journals specialized for Korean professional mathematics teachers published by the Korean Society of Teachers of Mathematics. Using the text mining approach, we also traced how these pedegogical lexicons have changed quantitatively over the past 10 years with a diachronic perspective. As a results, several novel terms were found in the writing-oriented community, which were not identified in the speaking-oriented community. In addition, we could discover some pedagogical lexicons have increased statistically significantly and some lexicons appeared(increased) rapidly across years. This implies the teacher community's values and zeitgeist by reflecting these changes in the sociocultural, incidental and social changing (i.e., periodical change) contexts. This study has value as a first step in understanding zeitgeist for mathematics education in Korean mathematics teacher community according to changes of times over the past 10 years. Also, this study contributes to the methodological insights: the text mining technique provides a methodological contribution to researching changes in interests, values and zeitgeist according to these changes in the times.