• Title/Summary/Keyword: 동시단어 분석

Search Result 186, Processing Time 0.028 seconds

Implementation of an Information Retrieval System with Multiple Indexing (다중색인에 의한 정보검색 시스템 구현)

  • Lee, Jun-Young;Kang, Sang-Bae;Yang, Jang-Mo;Park, Seung;Park, Hyun-Joo;Kim, Min-Jung;Kwon, Hyuk-Chul
    • Annual Conference on Human and Language Technology
    • /
    • 1996.10a
    • /
    • pp.63-67
    • /
    • 1996
  • 이 논문에서는 대량의 신문기사나 일반 텍스트 문서를 효율적으로 저장 및 검색 할 수 있는 정보검색 시스템을 구현한다. 이 시스템은 문서의 주제, 저자, 날짜, 출판사 또는 사용자 정의에 의한 속성과 본문에 대한 색인어와 색인관련정보를 생성한다. 모든 색인어는 최대 64가지의 속성정보와 문서별 단어빈도(tf)를 가질 수 있다. 색인은 형태소 분석을 이용하는 방법과 N-gram을 이용하는 방법이 동시에 사용되며, 색인어는 가중치를 가진다. 이 논문에서 구현한 시스템을 이용하여 7개월치 신문자료를 색인한 결과, 생성된 데이터베이스의 크기는 원래 문서의 약 22%이며 문서의 개수가 증가함에 따라 점점 그 비율은 감소한다.

  • PDF

An Exploratory Study on the Korean National R&D Trends Using Co-Word Analysis (단어동시출현분석을 통한 한국의 국가 R&D 연구동향에 관한 탐색적 연구)

  • Seo, Wonchul;Park, Hyunseok;Yoon, Janghyeok
    • Journal of Information Technology Applications and Management
    • /
    • v.19 no.4
    • /
    • pp.1-18
    • /
    • 2012
  • This paper identifies technology trends of national research and development (national R&D) by exploiting Korean national R&D patents, ranging from 2007 to 2010. In this paper, co-word analysis (CWA), which is a method to identify the relationship among technology terms by using their co-occurrences, is incorporated into network analysis to visualize the relationships among technology keywords of national R&D patents and calculate network indexes concerning inter-relationship diversity and strength of technology keywords. As a result, this research found that inter-relationship among technology keywords in national R&D are getting increasingly strengthening in an overall sense. In addition, the keyword inter-relationship diversity-strength map proposed in this paper revealed some significant technological keywords of national R&D : core technology keywords including "sensor", "film" and "fuel" and emerging keywords including "biosensor" and "thermoelectric". Because the proposed approach helps identify interdisciplinary trends of technology keywords from a massive volume of national R&D patents in a visual and quantitative way, we expect that the approach can be incorporated as a preliminary into the R&D planning process to assist R&D policy makers to understand technology convergence of national R&D and develop relevant R&D policies.

Design of a Real-Time Visual Loop Closure Detector using Key Frame Images (키 프레임 영상을 이용한 실시간 시각 루프 결합 탐지기의 설계)

  • Kim, Hye-Suk;Kim, Joo-Hee;Kim, Dong-Ha;Kim, In-Cheol
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.809-812
    • /
    • 2014
  • 본 논문에서는 키 프레임 영상을 이용한 효과적인 실시간 시각 루프 결합 탐지기를 제안한다. 시각 루프 결합 탐지기는 과거에 지나온 위치들 중 하나를 다시 재방문하였는지를 판단하기 위해, 새로운 입력 영상을 이미 지나온 위치들에서 수집한 과거 영상들과 모두 비교해 보아야 한다. 따라서 새로운 위치나 장소를 방문할수록 비교 대상 영상들이 계속해서 증가하기 때문에, 일반적으로 루프 결합 탐지는 높은 정확도와 실시간성을 동시에 만족하기 어렵다. 이러한 문제점을 극복하기 위해, 본 시스템에서는 입력 영상들 중에서 키 프레임들만을 골라 비교함으로써, 루프 결합 탐지에 필요한 비교 연산량을 효과적으로 줄이는 방법을 채택하였다. 또한 본 시스템에서는 루프 결합 탐지의 정확도와 효율성을 높이기 위해, 키 프레임 영상들을 시각 단어들의 집합(BoW)으로 표현하고, DBoW 데이터베이스 시스템을 이용해 키 프레임 영상들에 대한 색인을 구성하였다. TUM 대학의 벤치마크 데이터들을 이용한 성능 분석 실험을 통해, 본 논문에서 제안한 시각 루프 결합 탐지기의 높은 성능을 확인할 수 있었다.

korean-Hanja Translation System based on Semantic Processing (의미처리 기반의 한글-한자 변환 시스템)

  • Kim, Hong-Soon;Sin, Joon-Choul;Ok, Cheol-Young
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.04a
    • /
    • pp.398-401
    • /
    • 2011
  • 워드프로세서에서의 한자를 가진 한글 어휘의 한자 변환 작업은 사용자에 의해 음절/단어 단위의 변환으로 많은 시간이 소요되어 효율이 떨어진다. 본 논문에서는 한글 문장의 의미처리를 통해 문맥에 맞는 한자를 자동 변환하는 시스템을 제안한다. 문맥에 맞는 한글-한자 변환을 위해서는 우선 정확한 형태소 분석 및 동형이의어 분별이 선행되어야 한다. 이를 위해 본 논문에서는 은닉마르코프모델 기반의 형태소 및 동형이의어 동시 태깅 시스템을 구현하였다. 제안한 시스템은 형태의미 세종 말뭉치 1,100만여 어절을 이용하여 unigram과 bigram을 추출 하였고, unigram을 이용하여 어절의 생성확률 사전을 구축하고 bigram을 이용하여 전이확률 학습사전을 구축하였다. 그리고 품사 및 동형이의어 태깅 후 명사를 표준국어대사전에 등재된 한자로 변환하는 시스템을 구현하였다. 구현된 시스템의 성능 확인을 위해 전체 세종 말뭉치를 문장단위로 비학습 말뭉치를 구성하여 실험하였고, 실험결과 한자를 가진 동형이의어에 대한 한자 변환에서 90.35%의 정확률을 보였다.

Analysis of Word Based Classification of U.S. Public Libraries and its Implications (주제어 기반 분류에 관한 연구 - 미국 공공도서관의 사례를 중심으로 -)

  • Baek, Ji-Won
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.44 no.4
    • /
    • pp.179-201
    • /
    • 2010
  • This study aims to analyze the word based classification used in U.S. public libraries and their implications for Korean libraries. For this purpose, eleven U.S. public libraries using the word based classification system were selected and the specific classification types, their motivation, collection size, methods used in the conversion from DDC, and pros and cons were examined. The result of the analysis shows that the word based classification system may be categorized into the two types: Dewey-free or Dewey-lite and its application methods are different case by case. As a result, the positive impacts and implied problems of the word based classification system for library use and library operation were examined. In addition, the new system's implications on the Korean libraries were also discussed.

Combinatory Categorial Grammar for the Syntactic, Semantic, and Discourse Analyses of Coordinate Constructions in Korean (한국어 병렬문의 통사, 의미, 문맥 분석을 위한 결합범주문법)

  • Cho, Hyung-Joon;Park, Jong-Cheol
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.4
    • /
    • pp.448-462
    • /
    • 2000
  • Coordinate constructions in natural language pose a number of difficulties to natural language processing units, due to the increased complexity of syntactic analysis, the syntactic ambiguity of the involved lexical items, and the apparent deletion of predicates in various places. In this paper, we address the syntactic characteristics of the coordinate constructions in Korean from the viewpoint of constructing a competence grammar, and present a version of combinatory categorial grammar for the analysis of coordinate constructions in Korean. We also show how to utilize a unified lexicon in the proposed grammar formalism in deriving the sentential semantics and associated information structures as well, in order to capture the discourse functions of coordinate constructions in Korean. The presented analysis conforms to the common wisdom that coordinate constructions are utilized in language not simply to reduce multiple sentences to a single sentence, but also to convey the information of contrast. Finally, we provide an analysis of sample corpora for the frequency of coordinate constructions in Korean and discuss some problematic cases.

  • PDF

An Investigation on Digital Humanities Research Trend by Analyzing the Papers of Digital Humanities Conferences (디지털 인문학 연구 동향 분석 - Digital Humanities 학술대회 논문을 중심으로 -)

  • Chung, EunKyung
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.55 no.1
    • /
    • pp.393-413
    • /
    • 2021
  • Digital humanities, which creates new and innovative knowledge through the combination of digital information technology and humanities research problems, can be seen as a representative multidisciplinary field of study. To investigate the intellectual structure of the digital humanities field, a network analysis of authors and keywords co-word was performed on a total of 441 papers in the last two years (2019, 2020) at the Digital Humanities Conference. As the results of the author and keyword analysis show, we can find out the active activities of Europe, North America, and Japanese and Chinese authors in East Asia. Through the co-author network, 11 dis-connected sub-networks are identified, which can be seen as a result of closed co-authoring activities. Through keyword analysis, 16 sub-subject areas are identified, which are machine learning, pedagogy, metadata, topic modeling, stylometry, cultural heritage, network, digital archive, natural language processing, digital library, twitter, drama, big data, neural network, virtual reality, and ethics. This results imply that a diver variety of digital information technologies are playing a major role in the digital humanities. In addition, keywords with high frequency can be classified into humanities-based keywords, digital information technology-based keywords, and convergence keywords. The dynamics of the growth and development of digital humanities can represented in these combinations of keywords.

Exploring the Research Trend Changes on Convergence Education of Before and After 2011 in Science Education (2011년 전후의 과학교육분야에서의 융합교육 연구동향의 변화 탐색)

  • Song, Youngwook;Paik, Seoung-Hey
    • Journal of The Korean Association For Science Education
    • /
    • v.40 no.5
    • /
    • pp.531-542
    • /
    • 2020
  • The purpose of this study is to explore the research trend changes of convergence education since 2011 compared to the convergence education research that has been steadily continuing in science education. The trend in convergence education were investigated by comparing the number of publications, research subjects, research content, and topic linkages with previous studies, and using the network analysis method to check recent research trends. In the field of science education, the number of papers related to convergence education has been published more than 8.0% steadily, and it has been increasing since 2012, then decreasing again from 2015 and gradually increasing again from 2017. The subjects of study were high in elementary school students, while those in middle school, high school, and university students were low. While the number of in-service teachers increased, the number of pre-service teachers decreased, and the literature and public increased somewhat. In study content, effectiveness studies decreased, while development studies increased, and theoretical and perception studies appeared similar. In thematic linkage, the intra-science linkage was 23.9%, and the extra-science linkage was 76.1% and engineering/technology and art were high in extra-science linkage. In network analysis, elementary, science, STEAM, and program words have a high frequency of appearance and appear together with other words to lead the network. The educational implications of the research trend of convergence education will be more emphasized in the field of science education in the future, and in order to take root in the education field, research on secondary students should be more actively studied. In addition, it is necessary to move away from research on STEAM-centered program development and effects, and to increase research to establish the philosophical basis and theoretical of convergence education.

Mood and Color Distribution Characteristics of Music Genres (음악 장르에 따른 분위기와 색상 분포의 특성)

  • Moon, Chang-Bae;Kim, Hyun-Soo;Song, Min-Kyun;Kim, Byeong-Man
    • Science of Emotion and Sensibility
    • /
    • v.14 no.1
    • /
    • pp.59-72
    • /
    • 2011
  • Since stress can cause a variety of diseases, the relaxation of stress is an important factor for preventing diseases. One way to relieve stress is to use auditory or visual materials. If auditory and visual ones are used together, the effect of stress relaxation will be maximized. In this context, we analyze mood distribution of genre of music and color distribution of mood from the mood data for musics and color data for mood words collected directly from volunteers. Based on these two distributions, we also perform the $X^2$-test with Minitab for checking that color distributions are different from genre to genre. The results show that a different genre has a different color distribution and that the distributions of color, brightness and saturation depend on mood (P<0.0001). The results will be used to develop an emotional lighting system which plays lighting according to music mood, which can be applied to psychotherapy but more data and analysis are needed for clinical trials.

  • PDF

A Design of Similar Video Recommendation System using Extracted Words in Big Data Cluster (빅데이터 클러스터에서의 추출된 형태소를 이용한 유사 동영상 추천 시스템 설계)

  • Lee, Hyun-Sup;Kim, Jindeog
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.2
    • /
    • pp.172-178
    • /
    • 2020
  • In order to recommend contents, the company generally uses collaborative filtering that takes into account both user preferences and video (item) similarities. Such services are primarily intended to facilitate user convenience by leveraging personal preferences such as user search keywords and viewing time. It will also be ranked around the keywords specified in the video. However, there is a limit to analyzing video similarities using limited keywords. In such cases, the problem becomes serious if the specified keyword does not properly reflect the item. In this paper, I would like to propose a system that identifies the characteristics of a video as it is by the system without human intervention, and analyzes and recommends similarities between videos. The proposed system analyzes similarities by taking into account all words (keywords) that have different meanings from training videos, and in such cases, the methods handled by big data clusters are applied because of the large scale of data and operations.