• Title/Summary/Keyword: Data Weighting Scheme

Search Result 47, Processing Time 0.026 seconds

Comparison of term weighting schemes for document classification (문서 분류를 위한 용어 가중치 기법 비교)

  • Jeong, Ho Young;Shin, Sang Min;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.265-276
    • /
    • 2019
  • The document-term frequency matrix is a general data of objects in text mining. In this study, we introduce a traditional term weighting scheme TF-IDF (term frequency-inverse document frequency) which is applied in the document-term frequency matrix and used for text classifications. In addition, we introduce and compare TF-IDF-ICSDF and TF-IGM schemes which are well known recently. This study also provides a method to extract keyword enhancing the quality of text classifications. Based on the keywords extracted, we applied support vector machine for the text classification. In this study, to compare the performance term weighting schemes, we used some performance metrics such as precision, recall, and F1-score. Therefore, we know that TF-IGM scheme provided high performance metrics and was optimal for text classification.

Style-Specific Language Model Adaptation using TF*IDF Similarity for Korean Conversational Speech Recognition

  • Park, Young-Hee;Chung, Min-Hwa
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.2E
    • /
    • pp.51-55
    • /
    • 2004
  • In this paper, we propose a style-specific language model adaptation scheme using n-gram based tf*idf similarity for Korean spontaneous speech recognition. Korean spontaneous speech shows especially different style-specific characteristics such as filled pauses, word omission, and contraction, which are related to function words and depend on preceding or following words. To reflect these style-specific characteristics and overcome insufficient data for training language model, we estimate in-domain dependent n-gram model by relevance weighting of out-of-domain text data according to their n-. gram based tf*idf similarity, in which in-domain language model include disfluency model. Recognition results show that n-gram based tf*idf similarity weighting effectively reflects style difference.

Analysis and Localization of freeWAIS-sf (FreeWAIS-sf의 분석 및 한글화)

  • O, Jeong-Seok;Kim, Ji-Seung;Lee, Jun-Ho;Lee, Sang-Ho
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.5 no.5
    • /
    • pp.611-618
    • /
    • 1999
  • An efficient and effective access to needed information becomes an important factor in the modern information society. Many people have developed information retrieval (IR) systems that retrieve needed information from a large amount of data at a given time. However, most freely available IR systems have been developed for English text rather than for Korean text. In this research, we have analyzed the IR system freeWAIS-sf, and localized it with the Korean morphological analyzer, namely HAM. The localized freeWAIS-sf can handle both English text and Korean text simultaneously. We have also modified the weighting scheme of freeWAIS-sf. The experimental result shows that the modified weighting scheme outperforms the original one in terms of retrieval effectiveness.

Impact of Drag-Related Weighting Coefficients in Vegetated Open-Channel Flows (식생된 개수로에서 항력가중계수가 흐름에 미치는 영향 분석)

  • Kang, Hyeongsik;Choi, Sung-Uk
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.26 no.5B
    • /
    • pp.529-537
    • /
    • 2006
  • This paper investigates the impacts of the drag-related weighting coefficients on mean velocity and turbulence structures. The transport equations for the Reynolds stress of vegetated open-channel flows are derived by using the temporal- and horizontal-averaging scheme. It is found that the total Reynolds stress of vegetated open channel flows consists of the Reynolds stress due to temporally fluctuating velocities and the Reynolds stress due to spatially fluctuating velocities. The drag-related weighting coefficient $C_{fk}$ for the total Reynolds stress component is found to be unit, while the coefficient for the Reynolds stress due to temporally fluctuating velocities can be negligible. This is the reason why very small weighting coefficients in previous studies yield very good agreements with measured data. In other words, the Reynolds stress due to spatially fluctuating velocities remains still unknown, especially due to the large number of measuring locations. Through a developed Reynolds stress model, vegetated open-channel flows are simulated and compared with measured data from the literature. Comparisons reveal that the computed mean flow and Reynolds stress structures are hardly affected by the drag-related weighting coefficients. However, the computed turbulence intensity profiles are significant different with the drag-related weighting coefficients. A budget analysis of the transport equations for the Reynolds stress component is carried to investigate why turbulence intensity is affected by the drag-related weighting coefficients.

Weight Adjustment Scheme Based on Hop Count in Q-routing for Software Defined Networks-enabled Wireless Sensor Networks

  • Godfrey, Daniel;Jang, Jinsoo;Kim, Ki-Il
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.1
    • /
    • pp.22-30
    • /
    • 2022
  • The reinforcement learning algorithm has proven its potential in solving sequential decision-making problems under uncertainties, such as finding paths to route data packets in wireless sensor networks. With reinforcement learning, the computation of the optimum path requires careful definition of the so-called reward function, which is defined as a linear function that aggregates multiple objective functions into a single objective to compute a numerical value (reward) to be maximized. In a typical defined linear reward function, the multiple objectives to be optimized are integrated in the form of a weighted sum with fixed weighting factors for all learning agents. This study proposes a reinforcement learning -based routing protocol for wireless sensor network, where different learning agents prioritize different objective goals by assigning weighting factors to the aggregated objectives of the reward function. We assign appropriate weighting factors to the objectives in the reward function of a sensor node according to its hop-count distance to the sink node. We expect this approach to enhance the effectiveness of multi-objective reinforcement learning for wireless sensor networks with a balanced trade-off among competing parameters. Furthermore, we propose SDN (Software Defined Networks) architecture with multiple controllers for constant network monitoring to allow learning agents to adapt according to the dynamics of the network conditions. Simulation results show that our proposed scheme enhances the performance of wireless sensor network under varied conditions, such as the node density and traffic intensity, with a good trade-off among competing performance metrics.

Performance Analysis of Buffer Aware Scheduling for Video Services in LTE Network

  • Lin, Meng-Hsien;Chen, Yen-Wen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.9
    • /
    • pp.3594-3610
    • /
    • 2015
  • Recent advancements in broadband wireless communication technologies enable mobile users to receive video streaming services with various smart devices. The long term evolution (LTE) network provides high bandwidth and low latency for several emerging mobile applications. This paper proposes the buffer aware scheduling (BAS) approach to schedule the downlink video traffic in LTE network. The proposed BAS scheme applies the weighting function to heuristically adjust the scheduling priority by considering the buffer status and channel condition of UE so as to reduce the time that UE stays in the connected state without receiving data. Both of 1080P and 2160P resolution video streaming sources were applied for exhaustive simulations to examine the performance of the proposed scheme by comparing to that of the fair bandwidth (FB) and the best channel quality indicator (CQI) schemes. The simulation results indicate that the proposed BAS scheme not only achieves better performance in power saving, streaming delivery time, and throughput than the FB scheme while maintaining the similar performance as the best CQI scheme in light traffic load. Specifically, the proposed scheme reduces streaming delivery time and generates less signaling overhead than the best CQI scheme when the traffic load is heavy.

Design of Adaptive Observer Applied to M.R.A.C. by Selection of State Variable Filter (상태변수 필터 선정에 의한 적응 관측기의 설계 및 기준모델 적응제어)

  • 홍연찬;김종환;최계근
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.24 no.4
    • /
    • pp.597-602
    • /
    • 1987
  • In this paper, an adaptive observe based upon the exponentially weighted least-squares method is implemented in the design of a model reference adaptive controller for an unknown time-invariant discrete single-input single-output linear plant. A method of selecting the state variable filter is proposed. In this scheme, all the past data are weithted exponentially with the weighting coefficient.

  • PDF

PAPR Reduction of an OFDM Signal by use of PTS scheme with MG-PSO Algorithm (MG-PSO 알고리즘을 적용한 PTS 기법에 의한 OFDM 신호의 PAPR 감소)

  • Kim, Wan-Tae;Yoo, Sun-Yong;Cho, Sung-Joon
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.46 no.1
    • /
    • pp.1-9
    • /
    • 2009
  • OFDM(Orthogonal Frequency Division Multiplexing) system is robust to frequency selective fading and narrowband interference in high-speed data communications. However, an OPDM signal consists of a number of independently modulated subcarriers and the superposition of these subcarriers causes a problem that can give a large PARR(Peak-to-Average Power Ratio). PTS(Partial Transmit Sequence) scheme can reduce the PAPR by dividing OFDM signal into subblocks and then multiplying the phase weighting factors to each subblocks, but computational complexity for selecting of phase weighting factors increases exponentially with the number of subblocks. Therefore, in this paper, MG-PSO(Modified Greedy algorithm-Particle Swarm Optimization) algorithm that combines modified greedy algorithm and PSO(Particle Swarm Optimization) algorithm is proposed to use for the phase control method in PTS scheme. This method can solve the computational complexity and guarantee to reduce PAPR. We analyzed the performance of the PAPR reduction when we applied the proposed method to telecommunication systems.

Hybrid Learning Architectures for Advanced Data Mining:An Application to Binary Classification for Fraud Management (개선된 데이터마이닝을 위한 혼합 학습구조의 제시)

  • Kim, Steven H.;Shin, Sung-Woo
    • Journal of Information Technology Application
    • /
    • v.1
    • /
    • pp.173-211
    • /
    • 1999
  • The task of classification permeates all walks of life, from business and economics to science and public policy. In this context, nonlinear techniques from artificial intelligence have often proven to be more effective than the methods of classical statistics. The objective of knowledge discovery and data mining is to support decision making through the effective use of information. The automated approach to knowledge discovery is especially useful when dealing with large data sets or complex relationships. For many applications, automated software may find subtle patterns which escape the notice of manual analysis, or whose complexity exceeds the cognitive capabilities of humans. This paper explores the utility of a collaborative learning approach involving integrated models in the preprocessing and postprocessing stages. For instance, a genetic algorithm effects feature-weight optimization in a preprocessing module. Moreover, an inductive tree, artificial neural network (ANN), and k-nearest neighbor (kNN) techniques serve as postprocessing modules. More specifically, the postprocessors act as second0order classifiers which determine the best first-order classifier on a case-by-case basis. In addition to the second-order models, a voting scheme is investigated as a simple, but efficient, postprocessing model. The first-order models consist of statistical and machine learning models such as logistic regression (logit), multivariate discriminant analysis (MDA), ANN, and kNN. The genetic algorithm, inductive decision tree, and voting scheme act as kernel modules for collaborative learning. These ideas are explored against the background of a practical application relating to financial fraud management which exemplifies a binary classification problem.

  • PDF

Surveillance Video Retrieval based on Object Motion Trajectory (물체의 움직임 궤적에 기반한 감시 비디오의 검색)

  • 정영기;이규원;호요성
    • Journal of Broadcast Engineering
    • /
    • v.5 no.1
    • /
    • pp.41-49
    • /
    • 2000
  • In this paper, we propose a new method of indexing and searching based on object-specific features at different semantic levels for video retrieval. A moving trajectory model is used as an indexing key for accessing the individual object in the semantic level. By tracking individual objects with segmented data, we can generate motion trajectories and set model parameters using polynomial curve fitting. The proposed searching scheme supports various types of queries including query by example, query by sketch, and query on weighting parameters for event-based video retrieval. When retrieving the interested video clip, the system returns the best matching event in the similarity order.

  • PDF