• Title/Summary/Keyword: term weighting method

Search Result 66, Processing Time 0.033 seconds

Representative Keyword Extraction from Few Documents through Fuzzy Inference (퍼지 추론을 이용한 소수 문서의 대표 키워드 추출)

  • 노순억;김병만;허남철
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2001.12a
    • /
    • pp.117-120
    • /
    • 2001
  • In this work, we propose a new method of extracting and weighting representative keywords(RKs) from a few documents that might interest a user. In order to extract RKs, we first extract candidate terms and then choose a number of terms called initial representative keywords (IRKS) from them through fuzzy inference. Then, by expanding and reweighting IRKS using term co-occurrence similarity, the final RKs are obtained. Performance of our approach is heavily influenced by effectiveness of selection method of IRKS so that we choose fuzzy inference because it is more effective in handling the uncertainty inherent in selecting representative keywords of documents. The problem addressed in this paper can be viewed as the one of calculating center of document vectors. So, to show the usefulness of our approach, we compare with two famous methods - Rocchio and Widrow-Hoff - on a number of documents collections. The results show that our approach outperforms the other approaches.

  • PDF

Enhanced Spatial Covariance Matrix Estimation for Asynchronous Inter-Cell Interference Mitigation in MIMO-OFDMA System (3GPP LTE MIMO-OFDMA 시스템의 인접 셀 간섭 완화를 위한 개선된 Spatial Covariance Matrix 추정 기법)

  • Moon, Jong-Gun;Jang, Jun-Hee;Han, Jung-Su;Kim, Sung-Soo;Kim, Yong-Serk;Choi, Hyung-Jin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.5C
    • /
    • pp.527-539
    • /
    • 2009
  • In this paper, we propose an asynchonous ICI (Inter-Cell Interference) mitigation techniques for 3GPP LTE MIMO-OFDMA down-link receiver. An increasing in symbol timing misalignments may occur relative to sychronous network as the result of BS (Base Station) timing differences. Such symbol synchronization errors that exceed the guard interval or the cyclic prefix duration may result in MAI (Multiple Access Interference) for other carriers. In particular, at the cell boundary, this MAI becomes a critical factor, leading to degraded channel throughput and severe asynchronous ICI. Hence, many researchers have investigated the interference mitigation method in the presence of asynchronous ICI and it appears that the knowledge of the SCM (Spatial Covariance Matrix) of the asynchronous ICI plus background noise is an important issue. Generally, it is assumed that the SCM estimated by using training symbols. However, it is difficult to measure the interference statistics for a long time and training symbol is also not appropriate for MIMO-OFDMA system such as LTE. Therefore, a noise reduction method is required to improve the estimation accuracy. Although the conventional time-domain low-pass type weighting method can be effective for noise reduction, it causes significant estimation error due to the spectral leakage in practical OFDM system. Therefore, we propose a time-domain sinc type weighing method which can not only reduce the noise effectively minimizing estimation error caused by the spectral leakage but also implement frequency-domain moving average filter easily. By using computer simulation, we show that the proposed method can provide up to 3dB SIR gain compared with the conventional method.

Estimation of Representative Runoff Ratio from Paddy Field for the Application of EMC Method (EMC 방법적용을 위한 논 대표 유출률 산정)

  • Choi, Dongho;Jung, Jaewoon;Yoon, Kwangsik;Jin, Sohyun;Choi, Wooyoung;Choi, Woojung;Kim, Sangdon;Yim, Byungjin;Choi, Yujin
    • Journal of Korean Society on Water Environment
    • /
    • v.26 no.6
    • /
    • pp.943-947
    • /
    • 2010
  • Runoff ratio of paddy fields for the application of Event Mean Concentration (EMC) method was studied. To measure actual runoff ratio of paddy fields, a field monitoring was conducted for 2008 ~ 2009 period. Long-term rainfall data of four cities in major river basins were analyzed and weighting factors were developed to consider temporal and spatial variation of rainfall distribution of Korean peninsula. The observed runoff ratio ranged 0.00 ~ 1.20 and arithmetic mean were 0.25, respectively. However, the representative runoff ratio for paddy fields was determined as 0.41 according to the method suggested by National Institute of Environmental Research (NIER).

Knowledge-poor Term Translation using Common Base Axis with application to Korean-English Cross-Language Information Retrieval (과도한 지식을 요구하지 않는 공통기반축에 의한 용어 번역과 한영 교차정보검색에의 응용)

  • 최용석;최기선
    • Korean Journal of Cognitive Science
    • /
    • v.14 no.1
    • /
    • pp.29-40
    • /
    • 2003
  • Cross-Language Information Retrieval (CLIR) deals with the documents in various languages by one language query. A user who uses one language can retrieve the documents in another language through CLIR system. In CLIR, query translation method is known to be more efficient. For the better performance of query translation, we need more resources like dictionary, ontology, and parallel/comparable corpus but usually not available. This paper proposes a new concept called the Common Base Axis which is adapted to Korean-English Query translation ann a new weighting method in dictionary based query translation. The essential idea is that we can express Korean and English word in one vector space by Common Base Axis and use it in calculating sense distance for query weighting. The experiments show that Common Base Axis gives us good performance without ontology and is especially good for one word query translation.

  • PDF

Development of Ride Comfort Measuring System for Railway with Multi-function (다기능성을 갖는 철도 차량용 승차감 측정시스템 개발)

  • Kim, Young-Guk;Kim, Seog-Won;Park, Chan-Kyeong;Kim, Ki-Hwan;Park, Tae-Won
    • Journal of Sensor Science and Technology
    • /
    • v.13 no.5
    • /
    • pp.369-377
    • /
    • 2004
  • Recently, the "ride comfort" problem becomes increasingly important because of today's needs for train speedup. The concept of term "ride comfort" is equivocal. Generally it is defined as the vehicle vibration. There are many studies on evaluation method of ride comfort for railway. But each of them recommends the different assessment method and the different guidance. In general, the evaluation methods defined in the standards, such as ISO 2631 and UIC 513R, and Ride Index suggested by Sperling, have been used in the railroad. But, only one or two methods of these can be evaluated by using the commercial ride comfort measuring system. Therefore, it is necessary to develop the new ride comfort measuring system for railway with multi-function. In this paper, the generalization of "ride comfort" and the design and verification of new ride comfort measuring system for railway with multi-function have been described and the application examples has been introduced.

Term Weighting Method by Postposition and Compound Noun Recognition (조사 유형 및 복합명사 인식에 의한 용어 가중치 부여 기법)

  • 강승식;이하규;손소현;홍기채;문병주
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10b
    • /
    • pp.196-198
    • /
    • 2001
  • 문서의 내용을 대표하는 용어를 추출하기 위해 일반적으로 영어에서는 명사구를 색인하는 기법을 사용하지만 주제어 추출의 관점에서 영어의 명사구가 한국어의 복합명사에 해당하기 때문에 한국어에서는 복합명사 색인 기법을 중요시하고 있다. 본 논문에서는 한글 문서에서 추출된 용어의 가중치를 결정하기 위하여 경험적인 방법에 따라 가중치를 계산하는 방법을 제안한다. 구체적인 가중치 계산 방법으로 용어 자체의 특성에 의한 가중치를 부여한 후에, 복합명사의 경계를 인식하여 띄어쓴 복합명사의 가중치를 조절하고, 다시 용어의 조사 유형에 따라 가중치를 재계산하는 방법을 제안한다. 신문기사에 대한 실험결과에 의하면 제안한 방법이 단순 출현빈도에 의한 주제어 추출 기법보다 정확도가 더 높았다.

  • PDF

Research of Term-Weighting Method in an Usenet Information Retrieval System (유즈넷 정보검색시스템에서 단어 가중치 적용방법에 관한연구)

  • 최재덕;최진석;박민식
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10b
    • /
    • pp.339-341
    • /
    • 1998
  • 다양한 정보교환 수단의 하나인 유즈넷은 방대한 정보량을 가진다. 사용자는 유즈넷에서 필요한 정보를 쉽게 찾지 못하므로 뉴스그룹 전체와 본문에서 정보 검색의 필요성을 인식하고 있다. 이 논문에서는 정보검색시스템을 유즈넷으로 확장시 단어 가중치 적용방법의 개선을 통해 검색효율을 향상시키고자 한다. 정보검색에서 단어의 중요도에 영향을 미치는 tf, idf 이외의 다른 요소인 카테고리빈도(category frequency, cf)를 활용하여 tf*idf방법에 역카테고리빈도(inverted categoary frequency, icf)를 고려한 유사도 계산 방법을 제시하고 이를 검증하였다. 실험 결과에서 상위 30위 내의 평균 적합문서의 수가 tf*{{{{ SQRT {idf$^2$+icf$^2$} }}}}방법이 tf*idf 방법보다 4.6% 향상됨을 알 수 있다.

Term Weighting Method for Natural Language Query Sentence (자연언어 질의 문장의 용어 가중치 부여 기법)

  • Kang, Seung-Shik;Lee, Ha-Gyu;Son, So-Hyun;Moon, Byung-Joo;Hong, Gi-Choi
    • Annual Conference on Human and Language Technology
    • /
    • 2002.10e
    • /
    • pp.223-227
    • /
    • 2002
  • 자연언어 질의 문장으로부터 검색어로 사용될 질의어의 추출 및 질의어 가중치를 계산하기 위하여 질의 문장들의 유형을 분석하였으며, 질의어 구문의 특성에 따라 용어들의 가중치를 계산하는 방법을 제안하였다. 용어의 가중치를 부여할 때 띄어쓴 복합명사와 접속 관계 등에 의해 연결된 명사구는 질의어 가중치를 동등하게 적용할 필요가 있다. 질의 문장에서 가중치가 동등하게 적용되는 명사구를 인식하기 위한 목적으로 구현된 명사구 chunking을 수행한 후에 각 용어들에 대한 질의어 가중치를 계산한다. 질의어 가중치를 계산하기 위하여 용어의 유형, 질의 구문의 특성, 문서 유형을 지칭하는 용어, 조사 유형, 용어의 길이 등에 따라 가중치를 조절하는 방법을 사용한다. 용어유형에 의한 가중치 계산은 추출된 용어의 품사 정보와 전문 용어 사전, 부사성 명사 사전을 이용하였다.

  • PDF

Hypertext Retrieval System Using XLinks (XLinks를 이용한 하이퍼텍스트 검색 시스템)

  • Kim, Eun-Jeong;Bae, Jong-Min
    • The KIPS Transactions:PartD
    • /
    • v.8D no.5
    • /
    • pp.483-494
    • /
    • 2001
  • Most of hypertext retrieval models consider documents as independent entities. They ignore relationships between documents of link semantics. in an information retrieval system for hypertext documents, retrieval effectiveness can be improved when ling information is used. Previous link-based hypertext retrieval models ignore link information while indexing. They utilize link information to re-rank the retrieval results. Therefore they are limited that only the documents is result-set utilize link information. This paper utilizes link information when indexing. We present how to use term weighting and inLinks weighting for ranking the relevant documents. Experimental results show that recall and precision evaluation according to the link semantics and the comparison with previously link_based hypertext retrieval model.

  • PDF

Adaptive User Profile for Information Retrieval from the Web

  • Srinil, Phaitoon;Pinngern, Ouen
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1986-1989
    • /
    • 2003
  • This paper proposes the information retrieval improvement for the Web using the structure and hyperlinks of HTML documents along with user profile. The method bases on the rationale that terms appearing in different structure of documents may have different significance in identifying the documents. The method partitions the occurrence of terms in a document collection into six classes according to the tags in which particular terms occurred (such as Title, H1-H6 and Anchor). We use genetic algorithm to determine class importance values and expand user query. We also use this value in similarity computation and update user profile. Then a genetic algorithm is used again to select some terms from user profile to expand the original query. Lastly, the search engine uses the expanded query for searching and the results of the search engine are scored by similarity values between each result and the user profile. Vector space model is used and the weighting schemes of traditional information retrieval were extended to include class importance values. The tested results show that precision is up to 81.5%.

  • PDF