• 제목/요약/키워드: term similarity

검색결과 212건 처리시간 0.025초

An Optimal Weighting Method in Supervised Learning of Linguistic Model for Text Classification

  • Mikawa, Kenta;Ishida, Takashi;Goto, Masayuki
    • Industrial Engineering and Management Systems
    • /
    • 제11권1호
    • /
    • pp.87-93
    • /
    • 2012
  • This paper discusses a new weighting method for text analyzing from the view point of supervised learning. The term frequency and inverse term frequency measure (tf-idf measure) is famous weighting method for information retrieval, and this method can be used for text analyzing either. However, it is an experimental weighting method for information retrieval whose effectiveness is not clarified from the theoretical viewpoints. Therefore, other effective weighting measure may be obtained for document classification problems. In this study, we propose the optimal weighting method for document classification problems from the view point of supervised learning. The proposed measure is more suitable for the text classification problem as used training data than the tf-idf measure. The effectiveness of our proposal is clarified by simulation experiments for the text classification problems of newspaper article and the customer review which is posted on the web site.

An Ontology-based Knowledge Management System - Integrated System of Web Information Extraction and Structuring Knowledge -

  • Mima, Hideki;Matsushima, Katsumori
    • 한국전자거래학회:학술대회논문집
    • /
    • 한국전자거래학회 2005년도 e-Biz World Conference 2005
    • /
    • pp.55-61
    • /
    • 2005
  • We will introduce a new web-based knowledge management system in progress, in which XML-based web information extraction and our structuring knowledge technologies are combined using ontology-based natural language processing. Our aim is to provide efficient access to heterogeneous information on the web, enabling users to use a wide range of textual and non textual resources, such as newspapers and databases, effortlessly to accelerate knowledge acquisition from such knowledge sources. In order to achieve the efficient knowledge management, we propose at first an XML-based Web information extraction which contains a sophisticated control language to extract data from Web pages. With using standard XML Technologies in the system, our approach can make extracting information easy because of a) detaching rules from processing, b) restricting target for processing, c) Interactive operations for developing extracting rules. Then we propose a structuring knowledge system which includes, 1) automatic term recognition, 2) domain oriented automatic term clustering, 3) similarity-based document retrieval, 4) real-time document clustering, and 5) visualization. The system supports integrating different types of databases (textual and non textual) and retrieving different types of information simultaneously. Through further explanation to the specification and the implementation technique of the system, we will demonstrate how the system can accelerate knowledge acquisition on the Web even for novice users of the field.

  • PDF

부상기술 예측을 위한 특허키워드정보분석에 관한 연구 - GHG 기술 중심으로 (Patent Keyword Analysis for Forecasting Emerging Technology : GHG Technology)

  • 최도한;김갑조;박상성;장동식
    • 디지털산업정보학회논문지
    • /
    • 제9권2호
    • /
    • pp.139-149
    • /
    • 2013
  • As the importance of technology forecasting while countries and companies manage the R&D project is growing bigger, the methodology of technology forecasting has been diversified. One of the forecasting method is patent analysis. This research proposes quick forecasting process of emerging technology based on keyword approach using text mining. The forecasting process is following: First, the term-document matrix is extracted from patent documents by using text mining. Second, emerging technology keyword are extracted by analyzing the importance of word from utilizing mean values and standard deviation values of the term and the emerging trend of word discovered from time series information of the term. Next, association between terms is measured by using cosine similarity. finally, the keyword of emerging technology is selected in consequence of the synthesized result and we forecast the emerging technology according to the results. The technology forecasting process described in this paper can be applied to developing computerized technology forecasting system integrated with various results of other patent analysis for decision maker of company and country.

유사 어절 트리와 비 색인어 기반의 문서 표절 유사도 분류 방법 ((The Classification Method of the Document Plagiarism Similarity based on Similar Syntagma Tree and Non-Index Term))

  • 천승환;김미영;이귀상
    • 한국컴퓨터산업학회논문지
    • /
    • 제3권8호
    • /
    • pp.1039-1048
    • /
    • 2002
  • 전자문서와 온라인으로 수신된 문서들은 표절 여부를 판별하기가 매우 어렵고 번거로운 일이다. 특히 학생들에게 부여된 과제물의 경우 동일한 주제에 대해서 작성되는 경우가 많으므로 독자적으로 작성된 문서와 표절되어진 문서를 판별하기가 쉽지 않다. 이것은 분류하고자 하는 문서들에서 주요 단어들 즉, 색인어들의 출현 빈도를 추출한 뒤 이를 이용하여 가장 적합한 카테고리를 찾는 기존의 방법들과는 전혀 다른 문제이다. 본 논문에서는 어절들의 -유사 어절 트리 구조와 색인어를 제외한 어절- 벡터를 기반으로 하여 비슷하게 작성된 문서들의 표절 판별을 목적으로 하는 작업에 적용될 수 있는 방법을 제안한다.

  • PDF

Prediction of Long-term Solar Activity based on Fractal Dimension Method

  • Kim, Rok-Soon
    • 천문학회보
    • /
    • 제41권1호
    • /
    • pp.45.3-46
    • /
    • 2016
  • Solar activity shows a self-similarity as it has many periods of activity cycle in the time series of long-term observation, such as 13.5, 51, 150, 300 days, and 11, 88 years and so on. Since fractal dimension is a quantitative parameter for this kind of an irregular time series, we applied this method to long-term observations including sunspot number, total solar irradiance, and 3.75 GHz solar radio flux to predict the start and maximum times as well as expected maximum sunspot number for the next solar cycle. As a result, we found that the radio flux data tend to have lower fractal dimensions than the sunspot number data, which means that the radio emission from the sun is more regular than the solar activity expressed by sunspot number. Based on the relation between radio flux of 3.75 GHz and sunspot number, we could calculate the expected maximum sunspot number of solar cycle 24 as 156, while the observed value is 146. For the maximum time, estimated mean values from 7 different observations are January 2013 and this is quite different to observed value of February 2014. We speculate this is from extraordinary extended properties of solar cycle 24. As the cycle length of solar cycle 24, 10.1 to 12.8 years are expected, and the mean value is 11.0. This implies that the next solar cycle will be started at December 2019.

  • PDF

용어 자동분류를 사용한 검색어 범주화의 분석적 고찰 (An Analytic Study on the Categorization of Query through Automatic Term Classification)

  • 이태석;정도헌;문영수;박민수;현미환
    • 정보처리학회논문지D
    • /
    • 제19D권2호
    • /
    • pp.133-138
    • /
    • 2012
  • 검색 창을 통해 입력된 검색어는 정보이용자가 의미 있는 자료를 찾아내는 적극적인 활동의 산물이다. 따라서 검색로그는 정보이용자의 관심사항을 알 수 있는 중요한 분석 데이터이다. 본 연구의 목적은 입력한 검색어의 범주화 결과와 엑세스한 문서의 범주가 어느 정도 유사한 상관관계를 가지는지 분석적으로 고찰해보는 것이다. KISTI(한국과학기술정보연구원)의 NDSL(과학기술정보센터) 사이트의 2009년 검색로그의 검색세션을 식별하고 검색세션단위로 검색어와 이용 자료를 추출한 후, 검색어에 대해 어떤 주제 분류에 속하는 용어인지 자동분류기로 식별한 결과가 실제 이용한 자료의 주제 분야와 잘 맞는지 비교하였다. 그 결과 상위 100개 검색어 분류에 대한 유사도 평균이 58.8%로 파악되었다. 결국 전체적인 유사도는 58.8%이하이며, 관련 연구에서 수행한 자료의 자동분류 검색성능 전문가 평가 결과인 76.8%에 비해 낮다. 이것은 검색어로 쓰인 용어가 다른 연구 분야의 관심 용어로 새롭게 주목 받고 있기 때문이라는 사실을 알 수 있었다.

텍스트마이닝(Text mining)을 활용한 한의학 원전 연구의 가능성 모색 -『황제내경(黃帝內經)』에 대한 적용례를 중심으로 - (Investigation of the Possibility of Research on Medical Classics Applying Text Mining - Focusing on the Huangdi's Internal Classic -)

  • 배효진;김창업;이충열;신상원;김종현
    • 대한한의학원전학회지
    • /
    • 제31권4호
    • /
    • pp.27-46
    • /
    • 2018
  • Objectives : In this paper, we investigated the applicability of text mining to Korean Medical Classics and suggest that researchers of Medical Classics utilize this methodology. Methods : We applied text mining to the Huangdi's internal classic, a seminal text of Korean Medicine, and visualized networks which represent connectivity of terms and documents based on vector similarity. Then we compared this outcome to the prior knowledge generated through conventional qualitative analysis and examined whether our methodology could accurately reflect the keyword of documents, clusters of terms, and relationships between documents. Results : In the term network, we confirmed that Qi played a key role in the term network and that the theory development based on relativity between Yin and Yang was reflected. In the document network, Suwen and Lingshu are quite distinct from each other due to their differences in description form and topic. Also, Suwen showed high similarity between adjacent chapters. Conclusions : This study revealed that text mining method could yield a significant discovery which corresponds to prior knowledge about Huangdi's internal classic. Text mining can be used in a variety of research fields covering medical classics, literatures, and medical records. In addition, visualization tools can also be utilized for educational purposes.

셰익스피어의 史劇作品에 나타난 服飾役割의 分析 (The Analysis of Costume Role in Shakespeare`s History Plays)

  • 정현숙;김진구
    • 복식문화연구
    • /
    • 제7권5호
    • /
    • pp.1-18
    • /
    • 1999
  • This study concerns the role of costume in Shakespeare\`s history plays from the viewpoint of the role theory. The term “role” has been used to represent the behavior expected of the occupant of a given position or status. A specific role can not be successfully performed without the aid of the costumes. Costumes are adopted in relation with a specific role. The term ‘role’ had been borrowed from the drama. The similarity between the role on the stage and the role of the social man had been recognized. The similarity between the role on the stage and the role of the social man had been recognized. The typical examples in which the costume help to make access to a specific role and can be effectively exploited for the performance of the role are manifested in the history plays of Shakespeare. Thus, our goal in this study is to analyze the role of costume which appears in Shakespeare\`s history plays from the viewpoint of the role theory. The role of social status and position reflects sex, age, occupation, class, economic position of the characters. In his works, the crown and the mace represented not only the throne but also a previllege and supreme position. The situation role of costume could be widely used for visualizing the psychological situation and external environments of the characters on the stage. The disguise role hided one\`s status, thereby makes possible acting other\`s position. The costume also could symbolize the social status, position, rank, occupation, and the situation, and functioned as a media fo delivering messages to others. The costume performed the role of the physical and psychic protection, and provided its wearer with consolation and peaceful mind. The costume reflected the custom of a society through its wearing configuration. The costume (or a uniform) adopted by a group notified the characteristics and the expectation of action of the group to others. The results obtained from this study can provide useful cues for understanding the role action in the social structure. This kind of understanding reveals the costume phenomena in real life, allows one to perform roles properly and efficiently, and opens our insight on the overall aspects of the costume culture.

  • PDF

관계형 데이터베이스에서의 시맨틱 기반 키워드 탐색 시스템 (Semantic-based Keyword Search System over Relational Database)

  • 양영휴
    • 한국컴퓨터정보학회논문지
    • /
    • 제18권12호
    • /
    • pp.91-101
    • /
    • 2013
  • 키워드의 모호성은 효율적인 키워드 탐색에 있어서 일반적인 이슈가 되어왔는데, 이 모호성은 탐색결과의 신뢰성에 큰 영향을 줄 수 있으며, 기본적으로 질의에 사용된 용어 자체가 가지는 문맥상 의미의 모호함에 기인한다. 질의 자체의 모호함뿐만 아니라, 사용자들이 그 탐색 결과를 적절하게 해석하기 위해 결과에 나타나는 키워드간의 관계도 중요하므로 명확하게 명시 되어야 한다. 이 논문에서는 기존의 질의 용어와 스키마 용어/인스턴스간의 키워드 매핑기법을 적용하여 키워드 탐색의 모호성을 해결한다. 용어간의 매핑에서는 질의 키워드와 스키마 용어간의 구문적 유사성은 물론 시맨틱 유사성까지 고려하기 때문에 기존의 시스템에 비해 매핑과 정밀도가 50% 이상 상승하는 결과를 얻을 수 있다. 탐색결과에 나타나는 용어간의 불분명한 관계를 점 더 명확하게 나타내기 위하여 시맨틱 웹 기술을 적용하여 키워드간의 의미 있는 관계를 더 많이 지식베이스 내에서 찾을 수 있도록 하였다.

교통카드 Tag 제약을 반영한 통행자 경로선택에 대한 합리성 평가 연구 : 수도권 지하철 네트워크를 중심으로 (Rationality of Passengers' Route Choice Considering Smart Card Tag Constraints : Focused on Seoul Metropolitan Subway Network)

  • 이미영;남두희;심대영
    • 한국ITS학회 논문지
    • /
    • 제19권6호
    • /
    • pp.14-25
    • /
    • 2020
  • 본 연구는 교통카드자료 이용하여 수도권 지하철을 통행하는 승객의 경로선택의 합리성에 대한 평가를 시행하는 방법론을 제안한다. 사용자 경로선택의 합리성은 최적의 경로를 선택한다는 기본원리로서 확정성과 유사성으로 구분한다. 확정성은 승객이 선택한 경로는 시스템적 최적경로와 일치하는 정도이다. 유사성은 시스템적 최적경로와 유사하게 파악되는 정도이다. 합리성을 판단하는 기법으로 K경로탐색기법을 이용하여 경로를 열거하는 방법을 구축하였다. 유사성 내에서 확정성을 파악하기 위하여 민자운영기관의 환승단말기 Tag 정보를 활용하였다. 따라서 유사성에서 승객이 선택한 최적경로는 Tag를 경유한 경로와 동일하다는 개념을 적용하였다. 연구결과 최적경로(K=1)로 나타나는 확정성은 90.4(%), K=(2-10)으로 나타나는 유사성은 7.9(%)로서 총 98.3(%)의 수도권 지하철 통행이 합리적으로 설명된다고 평가하였다. 비합리적 통행 1.7(%)는 사용자 다양성을 고려하여 나타나는 설명되지 않는 에러항으로 평가된다고 파악하였다.