• Title/Summary/Keyword: Q-algorithm

Search Result 690, Processing Time 0.024 seconds

Traffic Offloading in Two-Tier Multi-Mode Small Cell Networks over Unlicensed Bands: A Hierarchical Learning Framework

  • Sun, Youming;Shao, Hongxiang;Liu, Xin;Zhang, Jian;Qiu, Junfei;Xu, Yuhua
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.11
    • /
    • pp.4291-4310
    • /
    • 2015
  • This paper investigates the traffic offloading over unlicensed bands for two-tier multi-mode small cell networks. We formulate this problem as a Stackelberg game and apply a hierarchical learning framework to jointly maximize the utilities of both macro base station (MBS) and small base stations (SBSs). During the learning process, the MBS behaves as a leader and the SBSs are followers. A pricing mechanism is adopt by MBS and the price information is broadcasted to all SBSs by MBS firstly, then each SBS competes with other SBSs and takes its best response strategies to appropriately allocate the traffic load in licensed and unlicensed band in the sequel, taking the traffic flow payment charged by MBS into consideration. Then, we present a hierarchical Q-learning algorithm (HQL) to discover the Stackelberg equilibrium. Additionally, if some extra information can be obtained via feedback, we propose an improved hierarchical Q-learning algorithm (IHQL) to speed up the SBSs' learning process. Last but not the least, the convergence performance of the proposed two algorithms is analyzed. Numerical experiments are presented to validate the proposed schemes and show the effectiveness.

Improving Covariance Based Adaptive Estimation for GPS/INS Integration

  • Ding, Weidong;Wang, Jinling;Rizos, Chris
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • v.1
    • /
    • pp.259-264
    • /
    • 2006
  • It is well known that the uncertainty of the covariance parameters of the process noise (Q) and the observation errors (R) has a significant impact on Kalman filtering performance. Q and R influence the weight that the filter applies between the existing process information and the latest measurements. Errors in any of them may result in the filter being suboptimal or even cause it to diverge. The conventional way of determining Q and R requires good a priori knowledge of the process noises and measurement errors, which normally comes from intensive empirical analysis. Many adaptive methods have been developed to overcome the conventional Kalman filter's limitations. Starting from covariance matching principles, an innovative adaptive process noise scaling algorithm has been proposed in this paper. Without artificial or empirical parameters to be set, the proposed adaptive mechanism drives the filter autonomously to the optimal mode. The proposed algorithm has been tested using road test data, showing significant improvements to filtering performance.

  • PDF

The n+1 Integer Factorization Algorithm (n+1 소인수분해 알고리즘)

  • Choi, Myeong-Bok;Lee, Sang-Un
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.11 no.2
    • /
    • pp.107-112
    • /
    • 2011
  • It is very difficult to factorize composite number, $n=pq$ to integer factorization, p and q that is almost similar length of digits. Integer factorization algorithms, for the most part, find ($a,b$) that is congruence of squares ($a^2{\equiv}b^2$ (mod $n$)) with using factoring(factor base, B) and get the result, $p=GCD(a-b,n)$, $q=GCD(a+b,n)$ with taking the greatest common divisor of Euclid based on the formula $a^2-b^2=(a-b)(a+b)$. The efficiency of these algorithms hangs on finding ($a,b$) and deciding factor base, B. This paper proposes a efficient algorithm. The proposed algorithm extracts B from integer factorization with 3 digits prime numbers of $n+1$ and decides f, the combination of B. And then it obtains $x$(this is, $a=fxy$, $\sqrt{n}$ < $a$ < $\sqrt{2n}$) from integer factorization of $n-2$ and gets $y=\frac{a}{fx}$, $y_1$={1,3,7,9}. Our algorithm is much more effective in comparison with the conventional Fermat algorithm that sequentially finds $\sqrt{n}$ < $a$.

AN ALGORITHM FOR CIRCLE FITTING IN ℝ3

  • Kim, Ik Sung
    • Communications of the Korean Mathematical Society
    • /
    • v.34 no.3
    • /
    • pp.1029-1047
    • /
    • 2019
  • We are interested in the problem of determining the best fitted circle to a set of data points in space. This can be usually obtained by minimizing the geometric distances or various approximate algebraic distances from the fitted circle to the given data points. In this paper, we propose an algorithm in such a way that the sum of the squares of the geometric distances is minimized in ${\mathbb{R}}^3$. Our algorithm is mainly based on the steepest descent method with a view of ensuring the convergence of the corresponding objective function Q(u) to a local minimum. Numerical examples are given.

A Strategy for improving Performance of Q-learning with Prediction Information (예측 정보를 이용한 Q-학습의 성능 개선 기법)

  • Lee, Choong-Hyeon;Um, Ky-Hyun;Cho, Kyung-Eun
    • Journal of Korea Game Society
    • /
    • v.7 no.4
    • /
    • pp.105-116
    • /
    • 2007
  • Nowadays, learning of agents gets more and more useful in game environments. But it takes a long learning time to produce satisfactory results in game. So, we need a good method to shorten the learning time. In this paper, we present a strategy for improving the learning performance of Q-learning with prediction information. It refers to the chosen action at each status in the Q-learning algorithm, It stores the referred value at the P-table of prediction module, and then it searches some values with high frequency at the table. The values are used to renew second compensation value from the Q-table. Our experiments show that our approach gets the efficiency improvement of average 9% after the middle point of learning experiments, and that the more actions in a status space, the higher performance.

  • PDF

Reinforcement Learning with Clustering for Function Approximation and Rule Extraction (함수근사와 규칙추출을 위한 클러스터링을 이용한 강화학습)

  • 이영아;홍석미;정태충
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.11
    • /
    • pp.1054-1061
    • /
    • 2003
  • Q-Learning, a representative algorithm of reinforcement learning, experiences repeatedly until estimation values about all state-action pairs of state space converge and achieve optimal policies. When the state space is high dimensional or continuous, complex reinforcement learning tasks involve very large state space and suffer from storing all individual state values in a single table. We introduce Q-Map that is new function approximation method to get classified policies. As an agent learns on-line, Q-Map groups states of similar situations and adapts to new experiences repeatedly. State-action pairs necessary for fine control are treated in the form of rule. As a result of experiment in maze environment and mountain car problem, we can achieve classified knowledge and extract easily rules from Q-Map

A learning based algorithm for Traveling Salesman Problem (강화학습기법을 이용한 TSP의 해법)

  • 임준묵;길본일수;임재국;강진규
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2002.05a
    • /
    • pp.652-656
    • /
    • 2002
  • 본 연구에서는 각 수요지간의 시간이 확률적으로 주어지는 경우의 TSP(Traveling Salesman Problem)를 다루고자 한다. 현실적으로, 도심의 교통 체증 등으로 인해서 각 지점간의 걸리는 시간은 시간대별로 요일별로 심한 변화를 일으키기 마련이다. 그러나, 현재까지의 연구 결과는 수요지간의 경과시간이 확정적으로 주어지는 경우가 대부분으로, 도심물류 등에서 나타나는 현실적인 문제를 해결하는데는 많은 한계가 있다 본 연구에서는 문제의 해법으로 강화학습기법의 하나인 Q학습(Q-Learning)과 Neural Network를 활용한 효율적인 알고리즘을 제시한다.

  • PDF

A Fast Least-Squares Algorithm for Multiple-Row Downdatings (Multiple-Row Downdating을 수행하는 고속 최소자승 알고리즘)

  • Lee, Chung-Han;Kim, Seok-Il
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.1
    • /
    • pp.55-65
    • /
    • 1995
  • Existing multiple-row downdating algorithms have adopted a CFD(Cholesky Factor Downdating) that recursively downdates one row at a time. The CFD based algorithm requires 5/2p $n^{2}$ flops(floating point operations) downdating a p$\times$n observation matrix $Z^{T}$ . On the other hands, a HCFD(Hybrid CFD) based algorithm we propose in this paper, requires p $n^{2}$+6/5 $n^{3}$ flops v hen p$\geq$n. Such a HCFD based algorithm factorizes $Z^{T}$ at first, such that $Z^{T}$ = $Q_{z}$ RT/Z, and then applies the CFD onto the upper triangular matrix Rt/z, so that the total number of floating point operations for downdating $Z^{T}$ would be significantly reduced compared with that of the CFD based algorithm. Benchmark tests on the Sun SPARC/2 and the Tolerant System also show that performance of the HCFD based algorithm is superior to that of the CFD based algorithm, especially when the number of rows of the observation matrix is large.rge.

  • PDF

UWB Link-Adaptive Relay Transmission Protocol for WiMedia Distributed MAC Systems (WiMedia Distributed MAC 통신 시스템을 위한 UWB 링크에 적응적인 릴레이 통신 프로토콜)

  • Hur, Kyeong
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.37 no.3A
    • /
    • pp.141-150
    • /
    • 2012
  • The WiMedia Alliance has specified a Distributed Medium Access Control (D-MAC) protocol based on UWB for high speed wireless home networks and WPANs. In this paper, firstly, a time slot reservation protocol for relay transmission is proposed. Furthermore, we propose a novel relay node selection algorithm adaptive to current UWB link transmission rate. The proposed relay node selection algorithm has compatibility with current WiMedia D-MAC standard and is executed at each device according to the SoQ as a QoS criterion.

A Long-term Replenishment Contract for the ARIMA Demand Process (ARIMA 수요자정을 고려한 장기보충계약)

  • Kim Jong Soo;Jung Bong Ryong
    • Proceedings of the Society of Korea Industrial and System Engineering Conference
    • /
    • 2002.05a
    • /
    • pp.343-348
    • /
    • 2002
  • We are concerned with a long-term replenishment contract for the ARIMA demand process in a supply chain. The chain is composed of one supplier, one buyer and consumers for a product. The replenishment contract is based upon the well-known (s, Q) policy but allows us to contract future replenishments at a time with a price discount. Due to the larger forecast error of future demand, the buyer should keep a higher level of safety stock to provide the same level of service as the usual (s, Q) policy. However, the buyer can reduce his purchase cost by ordering a larger quantity at a discounted price. Hence, there exists a trade-off between the price discount and the inventory holding cost. For the ARIMA demand process, we present a model for the contract and an algorithm to find the number of the future replenishments. Numerical experiments show that the proposed algorithm is efficient and accurate.

  • PDF