• Title/Summary/Keyword: Term weighting

Search Result 110, Processing Time 0.025 seconds

Automatic Classification of Blog Posts Considering Category-specific Information (범주별 고유 정보를 고려한 블로그 포스트의 자동 분류)

  • Kim, Suah;Oh, Sungtak;Lee, Jee-Hyong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2015.01a
    • /
    • pp.11-14
    • /
    • 2015
  • 많은 블로그 제공 사이트는 블로그 포스트 작성자에게 미리 정의된 범주 (category)에 따라 포스트의 주제에 대하여 범주를 선택할 수 있는 환경을 제공한다. 그러나 블로거들은 작성한 포스트의 범주를 매번 수동으로 선택해야 하는 불편함이 있다. 이러한 불편함의 해결을 위해 블로그 포스트를 자동으로 분류해주는 기능을 제공한다면 블로그의 활용성이 증가할 것이다. 기존의 블로그 문서 분류의 연구는 각 범주의 고유 정보를 반영하는 것에 한계가 있었다. 이러한 문제를 해결하기 위해, 본 논문에서는 범주별 고유 정보를 반영한 어휘 가중치를 제안한다. 어휘 가중치의 분석을 위하여 범주별로 블로그 문서를 수집하고, 수집한 문서에서 어휘의 빈도와 문서의 빈도, 범주별 어휘빈도 등을 고려하여 새로운 지표인 CTF, CDF, IECDF를 개발하였다. 이러한 지표를 기반으로 기존의 Naive Bayes 알고리즘으로 학습하여, 블로그 포스트를 자동으로 분류하였다. 실험에서는 본 논문에서 제안한 가중치 방법인 TF-CTF-CDF-IECDF를 사용한 분류가 가장 높은 성능을 보였다.

  • PDF

Design of Big Data Preference Analysis System (빅데이터 선호도 분석 시스템 설계)

  • Son, Sung Il;Park, Chan Khon
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.11
    • /
    • pp.1286-1295
    • /
    • 2014
  • This paper suggests the way that it could improve the reliability about preference of user's feedback by adding weighting factor on sentiment analysis, and efficiently make a sentiment analysis of users' emotional perspective on the big data massively generated on twitter. To solve errors on earlier studies, this paper has improved recall and precision of sensibility determination by using sensibility dictionary subdivided sentiment polarity based on the level of sensibility and given impotance to sensibility determination by populating slang, new words, emoticons and idiomatic expressions not in the system dictionary. It has considered the context through conjunctive adverbs fixed in korean characteristics which are free to the word order. It also recognize sensibility words such as TF(Term Frequency), RT(Retweet), Follower which are weighting factors of preference and has increased reliability of preference analysis considering weight on 'a very emotional tweet', 'a recognised tweet from users' and 'a tweeter influencer'

INDEFINITE STOCHASTIC LQ CONTROL WITH CROSS TERM VIA SEMIDEFINITE PROGRAMMING

  • Luo, Chengxin;Feng, Enmin
    • Journal of applied mathematics & informatics
    • /
    • v.13 no.1_2
    • /
    • pp.85-97
    • /
    • 2003
  • An indefinite stochastic linear-quadratic(LQ) optimal control problem with cross term over an infinite time horizon is studied, allowing the weighting matrices to be indefinite. A systematic approach to the problem based on semidefinite programming (SDP) and .elated duality analysis is developed. Several implication relations among the SDP complementary duality, the existence of the solution to the generalized Riccati equation and the optimality of LQ problem are discussed. Based on these relations, a numerical procedure that provides a thorough treatment of the LQ problem via primal-dual SDP is given: it identifies a stabilizing optimal feedback control or determines the problem has no optimal solution. An example is provided to illustrate the results obtained.

Traffic Offloading Algorithm Using Social Context in MEC Environment (MEC 환경에서의 Social Context를 이용한 트래픽 오프로딩 알고리즘)

  • Cheon, Hye-Rim;Lee, Seung-Que;Kim, Jae-Hyun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.514-522
    • /
    • 2017
  • Traffic offloading is a promising solution to solve the explosive growth of mobile traffic. One of offloading schemes, in LIPA/SIPTO(Local IP Access and Selected IP Traffic Offload) offloading, we can offload mobile traffic that can satisfy QoS requirement for application. In addition, it is necessary for traffic offloading using social context due to large traffic from SNS. Thus, we propose the LIPA/SIPTO offloading algorithm using social context. We define the application selection probability using social context, the application popularity. Then, we find the optimal offloading weighting factor to maximize the QoS(Quality of Service) of small cell users in term of effective data rate. Finally, we determine the offloading ratio by this application selection probability and optimal offloading weighting factor. By performance analysis, the effective data rate achievement ratio of the proposed algorithm is similar with the conventional one although the total offloading ratio of the proposed algorithm is about 46 percent of the conventional one.

Composite estimation type weighting adjustment for bias reduction of non-continuous response group in panel survey (패널조사에서 비연속 응답 그룹 편향 보정을 위한 복합가중값)

  • Choi, Hyunga;Kim, Youngwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.375-389
    • /
    • 2019
  • Sample attrition according to a long-term tracking reduces the representativeness of the sample data in a panel study. Most panel surveys in South Korea and other countries have prepared response adjustment weights in order to solve problems regarding representativeness due to sample attrition. In this paper, we divided the panel data into continuous response group and non-continuous response group according to response patterns and considered a weighting adjustment method to reduce the bias of the non-continuous response group. A simulation indicated that the proposed composite estimation type weighting method, which reflected the characteristics of non-continuous response groups, could be more efficient than other weighting methods in terms of reducing non-response bias. As a case study, the proposed methods are applied to the Korean Longitudinal Study of Ageing (KLoSA) data of the Korea Employment Information Service.

Query Expansion and Term Weighting Method for Document Filtering (문서필터링을 위한 질의어 확장과 가중치 부여 기법)

  • Shin, Seung-Eun;Kang, Yu-Hwan;Oh, Hyo-Jung;Jang, Myung-Gil;Park, Sang-Kyu;Lee, Jae-Sung;Seo, Young-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.10B no.7
    • /
    • pp.743-750
    • /
    • 2003
  • In this paper, we propose a query expansion and weighting method for document filtering to increase precision of the result of Web search engines. Query expansion for document filtering uses ConceptNet, encyclopedia and documents of 10% high similarity. Term weighting method is used for calculation of query-documents similarity. In the first step, we expand an initial query into the first expanded query using ConceptNet and encyclopedia. And then we weight the first expanded query and calculate the first expanded query-documents similarity. Next, we create the second expanded query using documents of top 10% high similarity and calculate the second expanded query- documents similarity. We combine two similarities from the first and the second step. And then we re-rank the documents according to the combined similarities and filter off non-relevant documents with the lower similarity than the threshold. Our experiments showed that our document filtering method results in a notable improvement in the retrieval effectiveness when measured using both precision-recall and F-Measure.

Study of Annoyance in Relation to Exposure Time to Demonstration Noise (집회소음 노출시간에 따른 성가심도 연구)

  • Park, Hyung-Woo;Bae, Myung-Jin
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.16 no.6
    • /
    • pp.103-108
    • /
    • 2016
  • The size of urban areas is currently growing and the functions of cities are becoming increasingly complicated. Furthermore, more people are living in cities. The life of urban is getting closer and linked with neighboring people in many parts. In particular, people are making artificial noise, even though it might not consciously be noticed, in their daily live. Seoul is the most crowded place in Korea and the noise levels are 73dB or higher. People living in cities are exposed to noise pollution. In particular, loudspeakers used during demonstrations or to generate publicity, cause considerable noise, which in turn can be related to stress. Moreover, the noise restrictions defined by law are not adhered to. If enhanced noise regulations, no matter how residents are not forced to be a great stress field close to the noise and reduces the loudness -5dB do not feel well if the difference. Limiting the duration of noise rather than reducing the volume thus is a much more plausible way of reducing the damage caused by noise pollution. If the stress caused by the noise, you will see people or vehicles holding a megaphone at the roadside is not good for health if it may be a wise way to live that is getting rid of the noise pollution so quickly out of the area.

Development of a Simplified Source Term Estimation Model for a Spent Fuel from Westinghouse-type Reactors (웨스팅하우스형 원전 사용후핵연료에 대한 방사선원항 예측 모델 개발)

  • Cho, Dong-Keun;Kook, Dong-Hak;Choi, Heui-Joo;Choi, Jong-Won
    • Journal of Nuclear Fuel Cycle and Waste Technology(JNFCWT)
    • /
    • v.8 no.3
    • /
    • pp.239-245
    • /
    • 2010
  • There are 11,811 LWR spent fuels stored at reactor sites, as of 2009. Source terms based on reference spent fuel which represents entire spent fuels with bounding values in the aspect of source term has been applied to a design of nuclear installations, instead of those which are generated by weighting respective source term for each spent fuel. Simplified regression models to estimate total decay heat, radioactivity, and ingestion hazard index for spent fuel from Westinghouse-type reactors were developed in this study, because it can be used as a fundamental model for weighting source term for respective spent fuel to exclude conservativeness in source terms. It was found that the estimated source terms agreed with calculated value from ORIGEN-ARP within 5%. It was also found that the conservativeness could be excluded if the weight source terms were used as reference source term in the design. Therefore, it is expected that the developed regression model could be widely used in the conceptual design process of nuclear facilities related with storage and disposal of spent nuclear fuel.

Comparative Evaluation of Term Weighting Methods in Automatic Document Classification (문헌 자동분류에서 용어가중치 기법에 대한 연구)

  • 이재윤;최보영;정영미
    • Proceedings of the Korean Society for Information Management Conference
    • /
    • 2000.08a
    • /
    • pp.41-44
    • /
    • 2000
  • 정보검색 시스템의 성능을 향상시키기 위해서 다양한 용어가중치 공식이 제안 되어왔다. 용어가중치는 질의와 문헌을 비교하는 검색의 경우뿐만 아니라 문헌과 문헌을 비교하는 자동분류에서도 성능에 영향을 미칠 수가 있다. 본 논문에서는 다양한 용어가중치 공식에 대해서 살펴보고, 문헌 자동분류 성능에 미치는 영향을 문헌 클러스터링 실험과 범주화 실험을 통해 확인해 보았다.

  • PDF

INDEFINITE STOCHASTIC OPTIMAL LQR CONTROL WITH CROSS TERM UNDER IQ CONSTRAINTS

  • Luo, Cheng-Xin;Feng, En-Min
    • Journal of applied mathematics & informatics
    • /
    • v.15 no.1_2
    • /
    • pp.185-200
    • /
    • 2004
  • A stochastic optimal LQR control problem under some integral quadratic (IQ) constraints is studied, with cross terms in both the cost and the constraint functionals, allowing all the control weighting matrices being indefinite. Sufficient conditions for the well-posedness of this problem are given. When these conditions are satisfied, the optimal control is explicitly derived via dual theory.