Kang and Joo: Index-based Update Policy for Minimizing Information Mismatch with Markovian Sources ## Sunjung Kang and Changhee Joo # ** Index-based Update Policy for Minimizing Information Mismatch with Markovian Sources ** **Abstract:** We consider a scenario where a base station collects time-varying state information from multiple sources, and makes system decisions based on the collected information. When the information update is constrained to one source at a time, the state information at the base station can be stale and different from actual state of the sources, in which case the base station can make a false decision due to the information mismatch (or error). In this paper, we assume that the update decisions are made at the base station without current state information, and consider the problem of minimizing the information mismatch under limited communication capability. For two-state Markovian source, we consider two different types of estimators at the base station, and characterize the optimal update policy. For the symmetric case, we can obtain the closed-form average cost. Further, with multiple symmetric sources, we show that the problem is indexable and obtain the close-form Whittle’s index for the two different types of estimator. **Keywords:** remote estimation , restless multi-armed bandit , wireless networks , Whittle’s index ## I. INTRODUCTION IN recent years, the applications of real-time monitoring systems and cyberphysical systems, such as drone monitoring systems and medical monitoring systems, have been increasing. In these applications, sensors transmit update packets with time-varying sensing information to a remote monitor (or a base station/receiver) [1]. The remote monitor estimates the sources’ states from the transmitted information and makes system decisions as needed. To this end, keeping the information fresh through timely updates is critical in these applications. Although it is ideal to update fresh information continuously, this is often impractical due to the limitation of the network resources, which causes the information at the monitor to be stale. In a commercial IoT systems, a base station can take the role of the remote monitor. We use the terms of base station and remote monitor interchangeably, similarly for the terms of source and sensor. Regarding the performance metric for information update, recently the age of information (AoI) has attracted much attention [2], [3]. AoI is a performance metric measuring the freshness of information, which is the time elapsed since the most recently generated packet received by the receiver was sampled at the source. In [4], [5], the authors investigate the optimal update rate at a source to minimize the AoI and show that the zero-waiting update policy is not optimal in some scenarios (e.g., random transmission time). The state-space collapse of the AoI under heavy traffic has been observed in [6]. In [7], the authors develop optimal update policies for minimizing the re-defined AoI to address the normal and alarm states of a stochastic process in an energy harvesting status update system. A scenario where a base station transmits timely information to multiple nodes is studied in [8], where the AoI minimization problem with multiple nodes is formulated as a restless multi-armed bandit (RMAB) problem, and a low-complexity algorithm using Whittle’s index theory has been developed. In [9], the authors develop the centralized scheduling policies including Whittle’s index policy to minimize the average AoI under several system constraints. In [10], the authors analytically show the optimality of Whittle’s index policy for minimizing the AoI in many-source networks. Remote estimation has been studied [11]–[18], where the freshness of information is measured as how close the information state between a source and the remote monitor. In [11], the authors claim that the sampling strategy that minimizes the AoI does not always minimize the estimation error by taking an example of a source where information state follows a twostate Markov chain. In [12], the authors investigate sampling and scheduling policies to minimize the average AoI in networks with multiple sources updating a common receiver. In particular, for two-state Markovian sources, the authors employ the sample-at-change sampling policy and Whittle’s index policy for minimizing the AoI as a scheduling policy. Through simulations, the authors compare the proposed update policy with the optimal policy for minimizing the estimation error and show that the proposed update policy is near optimal, especially with a large number of sources. In [13], the authors develop Whittle’s index policy for minimizing the uncertainty of information measured by Shannon’s entropy in networks where a common receiver is updated by distributed sources of which states evolve according to a two-state Markov chain. The authors in [14], [15] study the optimal sampling rate to minimize the mean square error of information state of the Wiener-process and Ornstein-Uhlenbeck, respectively. In [16], [17], the authors consider a scenario where information state is an independent and identically distributed (i.i.d.) process and there is a communication cost to transmit information to a remote estimator. They showed that an optimal policy for minimizing the estimation error is of threshold type. In this paper, we consider a scenario where a base station collects state information from multiple two-state Markovian sources, and the information update is constrained to one source at a time due to the limitation of networks resources. This model is somewhat simple but captures the essential aspects of how the base station has to sequentially select a source to minimize the information mismatch (or estimation error) accounting for the Markovian behavior and the communication cost. Our problem is different from [9], [10], [13] in that this paper considers the estimation error as a metric to measure the freshness of information instead of the AoI [9], [10] or the uncertainty of information [13], and also different from [11], [14]–[17] in that the base station has uncertainty about not only future evolution of the information state, but also its history. Further, unlike [12], where the proposed scheduling policy minimizes the AoI with the rationale that the estimation error can be minimized accordingly under the assumption that the estimation error increases as the AoI increases, this paper does not use the assumption between the AoI and the estimation error and aims to design a scheduling policy that minimizes the estimation error. We formulate our problem as a partially observable Markov decision process (POMDP), where the belief state calculated by the Bayes rule is sufficient statistics. Some POMDP problem can be solved using multi-armed bandit (MAB) techniques, in which a scheduler selects one arm (or bandit), and receives a reward from the chosen arm. The state of arms can change in a Markovian manner. There are two types of the MAB process: rested and restless. For the former, only the chosen arm changes its state and the states of the other do not, in which case, it is shown that the Gitten’s index policy is optimal [19]. For the latter, the states of all arms change, and the transition law of a state can be different between the chosen arm and the others. It is shown that finding an optimal solution to this RMAB problem is PSPACE-hard [20]. In [21], Whittle relaxes the RMAB problem, and shows that if the problem satisfies some condition (or the problem is indexable), then an optimal solution is an index-type policy and the index is called Whittle’s index. In [22], the authors study the sufficient conditions for the indexability for general source models and develop an algorithm to compute Whittle’s indices for indexable restless bandits. It has been shown that Whittle’s index policy that activates an arm with the highest Whittle’s index (or K highest when multi-arm activation is allowed), is asymptotically optimal when the arms are stochastically identical [23], [24]. Further, it has been shown that Whittle’s index achieves heuristically near-optimal performance in many applications, including AoI [8]–[10], dynamic spectrum allocation [24], queueing system [25], and resource allocation problem [26]. In [27], the authors study a scenario where each bandit is modeled as a two-state Markov chain and when an arm is activated, the player receives a signal with noise, i.e., there is a probability that the actual state is 0 even when the received signal is 1 In this paper, we address a wireless scheduling problem to minimize the information mismatch between the base station and distributed sources by exploiting Whittle’s index. The most related studies are [28], [29] that apply Whittle’s index to an estimation problem with two-state Markov chain model. Our work is different from [28] in that we aim to minimize the penalty incurred when the estimated states are different from the actual states of sources and thus the penalty depends on both the source state and the estimation, while [28] aims to maximize the rewards that are incurred when an activated arm is in a specific state (i.e., rewards depend only on the source state). Also, our work is different from [29] in that we investigate the average cost criterion over infinite horizon while [29] investigates the discount reward criterion over infinite horizon. It is known that the infinite sum of discount cost/reward always converges if the one-step cost/reward is bounded, while the average cost/reward over infinite horizon can diverge even when the one-step cost/reward is bounded [30]. Further, the results of [29] are limited to the case of discount factor [TeX:] $$\beta \leq 1 / 2,$$ which implies that the results are applicable only to severe discounting scenarios, and cannot be extended (e.g., by taking [TeX:] $$\beta \rightarrow 1)$$ to our problem with the average cost criterion. Our contributions can be summarized as follows. We analytically show that an optimal solution to the problem of single source with communication cost is of threshold type, for two different types of estimators. Further, for the special case of symmetric source, we obtain the minimum expected average penalty in the closed form. For multi-source scenarios, we model our problem as an RMAB, and show that our problem of minimizing the average cost is indexable for symmetric sources, and obtain the closed-form Whittle index. As such, we develop a low-complexity scheduling algorithm using the Whittle’s index theory for the information error minimization. We show through simulations that our policy achieves near-optimal performance. The rest of paper is organized as follows. In Section II, we describe the system model and formulate our problem. In Section III, we characterize the optimal policy in a single source network with communication cost. In Section IV, we show that our problem is indexable, and obtain a closedform Whittle index. We verify through simulations that our algorithm performs well in Section V, and conclude our work in Section VI. ## II. SYSTEM MODEL We consider the scenario where a base station collects timevarying state information from N sources, and makes system decisions based on the collected information. We consider a time-slotted system, and at each time slot, the base station can request state information from at most one source. We assume that sources are numbered from 1 to N, and each source can have two states 0 and 1. The state of source i Fig. 1. Structure of time slot. varies following a Markov process with transition probability [TeX:] $$P_{i}=\left[p_{00}^{(i)}, p_{01}^{(i)} ; p_{10}^{(i)}, p_{11}^{(i)}\right]$$ and it is assumed that the transition probability is fixed and known to the base station. Let [TeX:] $$S_{i}(t) \in\{0,1\}$$ denote the state of source i at the beginning of time slot t, and let [TeX:] $$D_{i}(t) \in\{0,1\}$$ denote the latest updated state of source i at the base station at time slot t. [TeX:] $$\mathbf{S}(t)=\left[S_{1}(t), \cdots, S_{N}(t)\right] \text { and } \mathbf{D}(t)=\left[D_{1}(t), \cdots, D_{N}(t)\right]$$ denote their vector, respectively. Let [TeX:] $$\mathbf{u}(t)=\left[u_{1}(t), \cdots, u_{N}(t)\right]$$ denote the action vector in time slot t with [TeX:] $$u_{i}(t) \in\{0,1\},$$ where [TeX:] $$u_{i}(t)=1$$ implies that the base station requests state information to source i. The base station can request the information to at most one source^{1} in a time slot, i.e.,[TeX:] $$\sum_{i=1}^{N} u_{i}(t) \leq 1$$ for all t. We assume that the communication between the base station and the sources is reliable, and thus there is neither loss nor error in the information delivery. If the base station obtains the information from source i at time slot t, we have [TeX:] $$D_{i}(t)=S_{i}(t),$$ and otherwise, [TeX:] $$D_{i}(t)=D_{i}(t-1) . \text { Let } E_{i}(t)$$ denote the difference between actual state and latest updated state of source i at time slot t, i.e., [TeX:] $$E_{i}(t)=\left|S_{i}(t)-D_{i}(t)\right|.$$ For source i with [TeX:] $$u_{i}(t)=1,$$ we have [TeX:] $$E_{i}(t)=0 \text { since } D_{i}(t)=S_{i}(t).$$ We define the total error E(t) over N sources at time slot t as [TeX:] $$E(t)=\sum_{i=1}^{N} E_{i}(t).$$ We note that the state transition occurs on the boundary of time slot, and thus [TeX:] $$S_{i}(t)$$ is determined at the beginning of time slot. Then the base station makes decision u(t), which determines D(t) and thus E(t), as shown in Fig. 1. ^{1} 1Since we are investigating strongly decomposable index policies (e.g., myopic index and Whittle’s index), this constraint can be relaxed and our results can be applied to more general scenarios where [TeX:] $$K \geq 1$$ out of N sources can update the receiver simultaneously [23]. Let [TeX:] $$\mathcal{H}(t)$$ denote the history of the base station up to time slot t, i.e., [TeX:] $$\mathcal{H}(t)=\left\{\mathbf{u}(1), S_{a(1)}(1), \mathbf{D}(1), \cdots, \mathbf{u}(t), S_{a(t)}(t)\right. / , \mathbf{D}(t)\} \text { with } \mathcal{H}(0)=\emptyset,$$ where a(t) = i such that [TeX:] $$u_{i}(t)=1.$$ A policy [TeX:] $$\pi=(\pi(t))_{t=1}^{\infty}$$ is a sequence of mappings [TeX:] $$\pi(t): \ \mathcal{H}(t-1) \rightarrow \mathbf{u}(t).$$ Our objective is to find policy [TeX:] $$\pi$$ that minimizes the expected average error (or mismatch) over infinite time horizon: where D(0) is an initial value at the beginning of the first time slot. We consider a belief error vector [TeX:] $$\mathrm{e}(t)=\left[e_{1}(t), \cdots, e_{N}(t)\right],$$ where [TeX:] $$e_{i}(t)$$ denotes the belief error probability for source i at time slot t if its state is not updated in time slot t. Specifically, the belief error evolves according to the Bayes rule as where [TeX:] $$D=D_{i}(t-1), \bar{D}=1-D, \text { and } \mathcal{T}_{D}^{(i)}(e)$$ denotes the one-step belief error Note that the base station can compute [TeX:] $$e_{i}(t) \text { from } u_{i}(t-1), D_{i}(t-1), \text { and } e_{i}(t-1).$$ Also since it makes an update decision without full knowledge of system states, it can be modeled as a POMDP. It is known that sufficient statistics for making optimal decision under POMDP is the belief states, which is the probability distribution over states and can be updated using the Bayes rule [31] as in (3). We formulate the error minimization problem as a RMAB problem with tuple [TeX:] $$\left(D_{i}(t-1), e_{i}(t)\right)$$ as a state of arm i at tth round. If active action [TeX:] $$u_{i}(t)=1$$ is applied to arm i, state [TeX:] $$\left(D_{i}(t), e_{i}(t+1)\right)$$ is set as [TeX:] $$D_{i}(t)=S_{i}(t),$$ and [TeX:] $$e_{i}(t+1)=p_{D_{i} D_{i}}^{(i)}.$$ Otherwise, [TeX:] $$e_{i}(t+1)$$ evolves according to (3). At each time slot, consider a policy that assigns an index to each arm, which is a real-valued function of a state of an arm, and plays (or activates) the arm with the highest index. Such a policy is called as an index policy. Further, the index policy is called strongly decomposable if the index of each arm depends only on the characteristics of the arm. A simple example of strongly decomposable index policy is the myopic policy that directly uses [TeX:] $$e_{i}(t)$$ as an index of arm i at the time slot t to decide [TeX:] $$u_{i}(t).$$ This myopic policy generally does not perform well since it ignores the impact of current action on future errors. Another strongly decomposable index policy can be obtained using Whittle’s index [21], which achieves nearoptimal performance in a wide range of applications [8], [12], [24]–[26], [32]. The main challenge to adopt the Whittle’s index is the indexability of the problem. In this work, we show that our problem is indexable when the source dynamics can be represented by a symmetric Markov chain, i.e., [TeX:] $$p_{01}=p_{10}.$$ ## III. THRESHOLD POLICY FOR SINGLE SOURCE As a first step, we obtain an optimal solution to single source problem with communication (or transmission) cost. Note that, in the original problem (1), multiple sources are coupled through the constraint. By considering a dual problem, we can relax the constraint and decompose the problem to N per-source problems with communication cost, which can be interpreted as the Lagrangian multiplier associated with the time portion that packets are transmitted. We will use the cost as the index in our decision. At each time slot, the base station has an option of either collecting the state information at communication cost (active action) or skipping to the next time at the expense of information inaccuracy (passive action). For notational convenience, we omit subscript (or superscript) i for the single source case. Let [TeX:] $$c \geq 0$$ denote the communication cost that occurs when the base station collects the state information from the source. Then, the penalty at time slot t is [TeX:] $$P(t)=u(t) c+(1-u(t)) E(t),$$ and we can define the problem of minimizing the expected average penalty over infinite time horizon as It is known that a stationary optimal solution to (4) does not necessarily exist under average cost criterion over infinite time horizon [33]. In this work, we show the existence of a stationary optimal policy that solves (4) by considering the problem with vanishing discounts [28], [33], [34]. To this end, we consider the problem with discounted total penalty: with discount factor [TeX:] $$\beta \in(0,1),$$ and show that it always admits a stationary optimal solution of threshold type. Then, we show that, for symmetric source, there exists a stationary optimal policy that solves the original long-term average penalty problem. ##### A. Threshold Policy We first describe a discrete-time Markov Decision Process (MDP), whose state at time slot t is represented by (D; e). For each state, the base station has two possible actions, u = 0 (passive action) and u = 1 (active action). If the passive action is taken, then the state evolves to [TeX:] $$\left(D, \mathcal{T}_{D}(e)\right)$$ with probability 1. If the active action is taken with communication cost c, the state evolves to [TeX:] $$\left(D, \mathcal{T}_{D}(0)\right)$$ with probability 1-e, and to [TeX:] $$\left(\bar{D}, \mathcal{T}_{D}(0)\right)$$ with probability e, where [TeX:] $$\bar{D}=1-D$$ (i.e., the flipped state of D). The penalty under state (D; e) and action u is given as [TeX:] $$u c+(1-u) e.$$ Let [TeX:] $$V_{\beta, c}(D, e)$$ denote the minimum expected total penalty given initial state (D, e) for the problem with discount factor [TeX:] $$\beta$$ and communication cost c. Let [TeX:] $$V_{\beta, c}(D, e ; u)$$ denote the expected total penalty obtained by one-time deviation, i.e., by taking action u at this state and then following the optimal policy. Then, we can write the Bellman optimality equations as It is known that under the discounted total penalty criterion over infinite time horizon, there always exists a stationary optimal policy that satisfies the above Bellman equations (6) [30]. Remark 3.1: We implicitly assume that once the based station is updated with a new status information D, it holds and uses D as an estimate of the source’s status until the next update occurs. Besides this update-and-hold (UAH) estimator, another promising estimation of the source’s status is maximum a posteriori (MAP) [29]. Given (D, e), if [TeX:] $$e>0.5 \text {, }$$ the MAP estimator uses [TeX:] $$\bar{D}$$ as an estimate of the source’s status. Thus, the Bellman optimality equations for MAP estimator can be written as (6) using [TeX:] $$\tilde{V}_{\beta, c}(D, e)=\mathbb{I}\{e \leq 0.5\} V_{\beta, c}(D, e)+ \mathbb{I}\{e>0.5\} V_{\beta, c}(\bar{D}, 1-e)$$ instead of [TeX:] $$V_{\beta, c}(D, e)$$ in 2–4 lines. Throughout the paper, we assume the UAH estimator if otherwise stated, and remark on how the techniques can be also applied to the MAP estimator. Let [TeX:] $$u_{c}^{*}(D, e)$$ denote the optimal action given state (D, e) with a fixed cost c. It can be written as In order to prove that an optimal policy is of threshold type, we first show that the value function [TeX:] $$V_{\beta, c}(D, e)$$ is concave in e (Lemma 3.1), and that there exists a cost range such that a deterministic decision is an optimal action outside of the range (Lemma 3.2). Lemma 3.1: For fixed c, the value function [TeX:] $$V_{\beta, c}(D, e)$$ is concave in e. Proof: For notational convenience, we drop [TeX:] $$\beta, c \text { in } V_{\beta, c},$$ and let [TeX:] $$V_{n}(D, e)$$ denote the value function at nth iteration under value iterations starting at (D, e). We have Clearly, [TeX:] $$V_{1}(0, e) \text { and } V_{1}(1, e)$$ are concave in e. We now show the concavity of [TeX:] $$V_{n}(0, e) \text { and } V_{n}(1, e)$$ by induction. Suppose that [TeX:] $$V_{n}(0, e) \text { and } V_{n}(1, e)$$ are concave in e. Then, the first term in (8) is linear in e, and the second term is concave^{2} in e since [TeX:] $$V_{n}\left(0, \mathcal{T}_{0}(e)\right)=V_{n}\left(0, e p_{11}+(1-e) p_{01}\right).$$ Thus, [TeX:] $$V_{n+1}(0, e),$$ the minimum of two concave functions, is also concave in e. Similarly, we can obtain the same result for [TeX:] $$V_{n+1}(1, e).$$ From the convergence theorem of value iterations [30], we have [TeX:] $$V_{n}(D, e) \rightarrow V(D, e) \text { as } n \rightarrow \infty,$$ and thus V (D, e) is concave in e for fixed c. This completes the proof. ^{2} 2If f(x) is concave in x, then f(ax + b) is concave in x for [TeX:] $$a \neq 0.$$ Remark 3.2 (MAP): Under the MAP estimator, the second term of (8) will be replaced with [TeX:] $$e+\beta \tilde{V}_{n}\left(0, \mathcal{T}_{0}(e)\right),$$ where [TeX:] $$\tilde{V}_{n}\left(0, \mathcal{T}_{0}(e)\right)=\mathbb{I}\left\{\mathcal{T}_{0}(e) \leq 0.5\right\} V_{n}\left(0, \mathcal{T}_{0}(e)\right)+\mathbb{I}\left\{\mathcal{T}_{0}(e)>\right. \ 0.5\} V_{n}\left(1,1-\mathcal{T}_{0}(e)\right).$$ Since both [TeX:] $$V_{n}\left(0, \mathcal{T}_{0}(e)\right) \text { and } V_{n}(1,1- \left.\mathcal{T}_{0}(e)\right)$$ are concave, their non-negative weighted sum of concave functions is also concave. Thus, Lemma 3.1 holds for the MAP estimator. We now consider a range [TeX:] $$\left[c_{L}, c_{H}\right],$ beyond which an optimal policy is deterministic. Specifically, when [TeX:] $$C<C_{L},$$ the optimal policy has active action for all (D, e), and when [TeX:] $$c \geq c_{H},$$ it has passive action for all (D, e). In the following lemma, we show such [TeX:] $$c_{L} \text { and } c_{H}$$ exist. Lemma 3.2: For the single source network problem with communication cost c, there exists an optimal policy that makes the deterministic decision beyond range [TeX:] $$\left[c_{L}, c_{H}\right]$$ with Proof: Note that when c is very small, an optimal policy has [TeX:] $$u=1 \text { for all }(D, e) \in\{0,1\} \times[0,1],$$ and when c is very large, an optimal policy has u = 0 for all (D, e). Under this assumption, we solve Bellman equations (6) to obtain [TeX:] $$c_{L}$$ and [TeX:] $$c_{H},$$ and then show that the result satisfies the Bellman optimality equations. Case 1) When c is very small, updating is an optimal decision, and we will have This and (6) lead to Combining (11) and (12), it can be easily shown that [TeX:] $$V_{\beta, c}\left(0, p_{01}\right)=V_{\beta, c}\left(1, p_{10}\right)=c /(1-\beta),$$ with which and (6), we obtain that, for any [TeX:] $$(D, e) \in\{0,1\} \times[0,1],$$ To satisfy (10), it is required that [TeX:] $$c \leq e$$ for all [TeX:] $$e \in[0,1].$$ Thus, we can conclude that for all [TeX:] $$c<c_{L}=0,$$ (13) satisfies the Bellman optimality equations. In addition, in this case, the optimal action is u = 1. Case 2) Let [TeX:] $$\mathcal{T}_{D}^{k}(e)$$ denote the k-step belief error update of e given [TeX:] $$D \in\{0,1\},,$$ i.e., the belief of the base station when it does not collect the state information from the source for k consecutive slots. It is known in [35] that the belief error evolves as When c is very large, from (6) and (14), we can obtain Using (6), (16) and (17), we have that, for any [TeX:] $$e \in[0,1],$$ Combining the above equations and (15), we have [TeX:] $$c \geq \ \frac{\left(1-\beta+\beta p_{10}-\beta p_{01}\right) e}{(1-\beta)\left(1-\beta p_{11}+\beta p_{01}\right)}, \text { and } c \geq \frac{\left(1-\beta+\beta p_{01}-\beta p_{10}\right) e}{(1-\beta)\left(1-\beta p_{00}+\beta p_{10}\right)}.$$ Hence, for all [TeX:] $$c \geq c_{H},$$ where [TeX:] $$c_{H}:=\max \left\{\frac{1-\beta+\beta p_{10}-\beta p_{01}}{(1-\beta)\left(1-\beta p_{11}+\beta p_{01}\right)}\right.,\left.\frac{1-\beta+\beta p_{01}-\beta p_{10}}{(1-\beta)\left(1-\beta_{p 00}+\beta p_{10}\right)}\right\},$$ (18) satisfies the Bellman optimality equations, and in this case, the optimal action is u = 0. Remark 3.3: Cost [TeX:] $$C_{L}$$ makes the two actions indifferent at e = 0, and similarly, cost [TeX:] $$c_{H}$$ provides an indifferent decision at e = 1. Remark 3.4: If [TeX:] $$p_{11} \geq p_{01},$$ the k-step belief error [TeX:] $$\mathcal{T}_{0}^{k}(e)$$ monotonically increases to [TeX:] $$\pi_{1} \text { for } e<\pi_{1}$$ with respect to k, and monotonically decreases to [TeX:] $$\pi_{1} \text { for } e>\pi_{1}.$$ Hence, it converges to [TeX:] $$\pi_{1}$$ for all e. [TeX:] $$p_{11}<p_{01},$$ then [TeX:] $$\mathcal{T}_{0}^{2 k}(e)$$ monotonically increases to [TeX:] $$\pi_{1} \text { and } \mathcal{T}_{0}^{2 k+1}(e)$$ monotonically decreases to [TeX:] $$\pi_{1}$$ for [TeX:] $$e<\pi_{1}$$ with respect to k, and vice versa for [TeX:] $$e>\pi_{1}.$$ Thus, it also converges to [TeX:] $$\pi_{1}$$ for all e while it alternates around [TeX:] $$\pi_{1}.$$ The similar results hold for [TeX:] $$\mathcal{T}_{1}^{k}(e),$$ which converges to [TeX:] $$\pi_{0}$$ for all e. Remark 3.5 (MAP): Note that, from (3), it can be easily shown that [TeX:] $$\mathcal{T}_{\bar{D}}(1-e)=1-\mathcal{T}_{D}(e).$$ Hence, under the MAP estimator, the one-step evolution of state (D, e) is min [TeX:] $$\left\{\left(D, \mathcal{T}_{D}(e)\right),\left(\bar{D}, 1-\mathcal{T}_{D}(e)\right)\right\},$$ where the minimum compares the second value e of the pair (D, e). Given [TeX:] $$(\bar{D}, 1-e),$$ the state evolves to min [TeX:] $$\left\{\left(D, \mathcal{T}_{D}(1-e)\right),\left(D, 1-\mathcal{T}_{D}(1-\right.\right. e))\}=\min \left\{\left(\bar{D}, 1-\mathcal{T}_{D \bar{D}}(e)\right),\left(D, \mathcal{T}_{D}(e)\right)\right\}.$$ Thus, k-step evolution of min [TeX:] $$\{(D, e),(\bar{D}, 1-e)\}$$ for the MAP estimator is min [TeX:] $$\left\{\left(D, \mathcal{T}_{D}^{k}(e)\right),\left(\bar{D}, 1-\mathcal{T}_{D}^{k}(e)\right)\right\}.$$ Due to the different k- step evolution, the upper bound (9) no longer holds for the MAP estimator. However, we can still show that the upper bound is finite, [TeX:] $$c_{H}<\infty,$$ using its k-step error evolution of min [TeX:] $$\left\{\mathcal{T}_{D}^{k}(e), 1-\mathcal{T}_{D}^{k}(e)\right\},$$ as well as the same lower bound [TeX:] $$c_{L}=0.$$ Using Lemmas 3.1 and 3.2, we show that an optimal policy is of threshold type. Theorem 3.1: There is a threshold policy that solves the Bellman optimal equations (6). Specifically, given cost c, the policy has the decision with a real-valued threshold [TeX:] $$e_{D}^{*}(\beta, c).$$ Proof: From Lemma 3.2, [TeX:] $$u_{c}^{*}(D, e)=1$$ for all (D, e) when [TeX:] $$c \leq c_{L}, \text { and } u_{c}^{*}(D, e)=0$$ for all (D, e) when [TeX:] $$c \geq c_{H}.$$ Consider the case when [TeX:] $$c_{L}<c<c_{H}.$$ From Remark 3.3, we have [TeX:] $$V_{\beta, c}(D, 0 ; u=0)<V_{\beta, c}(D, 0 ; u=1)$$ and [TeX:] $$V_{\beta, c}(D, 1 ; u=0)>V_{\beta, c}(D, 1 ; u=1).$$ For fixed cost c, [TeX:] $$V_{\beta, c}(D, e ; u=1)$$ is linear in e from (6) and is linear in e from (6) and[TeX:] $$V_{\beta, c}(D, e ; u=0)$$ is concave in e from Lemma 3.1, and both of them are continuous in e from (18). Thus, there exists exactly one point [TeX:] $$e \in[0,1)$$ such that [TeX:] $$V_{\beta, c}(D, e ; u=0)=V_{\beta, c}(D, e ; u=1).$$ Further, for [TeX:] $$e \in\left[0, e^{*}\right],$$ we have [TeX:] $$V_{\beta, c}(D, e ; u=0) \leq V_{\beta, c}(D, e ; u=1),$$ and for [TeX:] $$e \in\left(e^{*}, 1\right],$$ we have [TeX:] $$V_{\beta, c}(D, e ; u=0)>V_{\beta, c}(D, e ; u=1).$$ Remark 3.6 (MAP): Theorem 3.1 also holds for the MAP estimator from Lemma 3.1 and the existence of range [TeX:] $$\left[c_{L}, c_{H}\right]$$ with [TeX:] $$c_{L}=0, c_{H}<\infty.$$ ##### B. Special Case of Symmetric Source In this subsection, we focus on symmetric scenarios with [TeX:] $$p_{01}=p_{10}=p,$$ under which we have [TeX:] $$\mathcal{T}_{0}^{k}(e)=\mathcal{T}_{1}^{k}(e)$$ and [TeX:] $$V_{\beta, c}(0, e)=V_{\beta, c}(1, e)$$ for all e, k, and thus we can simplify the state representation of the Markov chain with only e. We rewrite (6) by dropping D as In this case, we can show the existence of stationary optimal policy for the (undiscounted) long-run average penalty minimization problem. We use the following Dutta’s theorem. Theorem 3.2 (Dutta [36]): Let S be the state space and suppose that the dynamic programming problem is value bounded, i.e., there exist a state [TeX:] $$z \in S,$$ a function [TeX:] $$M(\cdot): S \rightarrow \mathbb{R},$$ and a constant [TeX:] $$M \in \mathbb{R}$$ such that Then we have the following results. 1) There exists [TeX:] $$\lambda \in \mathbb{R}$$ such that [TeX:] $$\lambda=\lim _{\beta \rightarrow 1}(1-\beta) V_{\beta}(s)$$ for all [TeX:] $$s \in S.$$ 2) There exists a stationary optimal policy for the long-run expected average problem. 3) Let [TeX:] $$\pi_{\beta}(\cdot)$$ be a stationary optimal policy for the discounted penalty minimization problem (5). If [TeX:] $$\pi_{\beta}(\cdot) \rightarrow \pi$$ pointwise as [TeX:] $$\beta \rightarrow 1,$$ then [TeX:] $$\pi$$ is a stationary optimal policy for the long-run average problem (4). The following lemma shows that Theorem 3.2 is applicable to our case. Lemma 3.3: The single symmetric source network problem with communication cost under the discounted total penalty criterion is value bounded. Specifically, we have Proof: From Lemma 3.1 and the proof of Lemma 3.2, we have three cases for [TeX:] $$V_{\beta, c}(e ; u=0) \text { and } V_{\beta, c}(e ; u=1)$$ as shown in Fig. 2. We mark [TeX:] $$V_{\beta, c}(e ; u=0) \text { and } V_{\beta, c}(e ; u=1) \ \left.0), V_{\beta, c}(e ; u=1)\right\}$$ as a solid line. When [TeX:] $$c<c_{L},$$ we have [TeX:] $$\left|V_{\beta, c}(e)-V_{\beta, c}\left(e^{\prime}\right)\right|=0$$ for all [TeX:] $$e, e^{\prime} \in[0,1].$$ When [TeX:] $$c_{L} \leq c<c_{H},$$ from (6) and (9), we have [TeX:] $$\left|V_{\beta, c}(e)-V_{\beta, c}\left(e^{\prime}\right)\right| \leq \left|V_{\beta, c}(1 ; u=1)-V_{\beta, c}(0 ; u=0)\right|=c<c_{H}=\frac{1}{1-\beta+2 \beta p}$$ for all [TeX:] $$e, e^{\prime} \in[0,1] .$$ If [TeX:] $$c \geq c_{H}, \text { then }\left|V_{\beta, c}(e)-V_{\beta, c}\left(e^{\prime}\right)\right|^{\mid} \leq \left|V_{\beta, c}(1 ; u=0)-\dot{V}_{\beta, c}(0 ; u=0)\right|=\frac{1}{1-\beta+2 \beta p}$$ for all [TeX:] $$e, e^{\prime} \in [0,1]$$ from (18). Remark 3.7 (MAP): Under the MAP estimator, the value boundedness can be also shown similarly, since [TeX:] $$c_{L}=0$$ and [TeX:] $$c_{H}<\infty.$$ From Theorem 3.2 and Lemma 3.3, there exists a stationary optimal policy that minimizes the long-term expected average penalty for a symmetric Markovian source. Further, if we can obtain the closed-form expressions of value functions [TeX:] $$V_{\beta, c}(p)$$ and threshold [TeX:] $$e^{*}(\beta, c),$$ we may directly obtain the solution to the expected average penalty problem. Since it is difficulty to obtain the closed-form [TeX:] $$e^{*}(\beta, c),$$ we show that an optimal policy is of threshold type by using the existence of an optimal policy of Theorem 3.2, and then obtain the closedform expression of the minimum expected average penalty [TeX:] $$\lambda$$ over infinite horizon.In the following, we study the problem (4) of the long-term expected average penalty and omit the discount factor [TeX:] $$\beta .$$ Given the existence of a stationary optimal policy, we have the following Bellman equations as where [TeX:] $$\phi_{c}(\cdot)$$ denotes the cost-to-go function that is the differential penalty occurred by the transient change [30]. It is not difficult to show that [TeX:] $$\phi_{c}(\cdot)$$ is concave in e for fixed c, and [TeX:] $$\phi_{c}(e)$$ is nondecreasing and concave in c for fixed e: we construct value iteration for the nth iteration costto- go function [TeX:] $$\phi_{n, c}(e),$$ and use the techniques in the proof of Lemma 3.1 along with convergence [TeX:] $$\phi_{n, c}(e) \rightarrow \phi_{c}(e)$$ as [TeX:] $$n \rightarrow \infty.$$ Also, we can obtain which can be obtained by either taking the limit [TeX:] $$\beta \rightarrow 1$$ in Lemma 3.2, or directly solving (20) as in Lemma 3.2. From the concavity of [TeX:] $$\phi_{c}(\cdot)$$ and the existence of boundary costs [TeX:] $$c_{L}$$ and [TeX:] $$c_{H},$$ it can be shown that an optimal policy that solves (20) is of threshold type, following the same line of analysis of Theorem 3.1. We omit the proof, and using the fact that an optimal policy is of threshold type. We now obtain the minimum expected average penalty [TeX:] $$\lambda.$$ Lemma 3.4: When a source follows a symmetric Markov process, we have the minimum expected average penalty [TeX:] $$\lambda$$ as When [TeX:] $$p \leq 1 / 2,$$ we have where [TeX:] $$\tilde{K}=\left|\log _{1-2 p} \frac{1-2 e^{*}(c)}{1-2 p}\right|+1, \text { and } h_{p}(x, y)=\frac{y}{2}- \frac{(1-2 x)\left(1-(1-2 p)^{y}\right)}{4 p}$$ When [TeX:] $$p>1 / 2,$$ we have Proof: Since [TeX:] $$\lambda$$ does not depend on the initial state [30], we solve (20) with initial belief state p under the stationary optimal policy of threshold type with threshold [TeX:] $$e^{*}(c).$$ Fig. 2. Value function [TeX:] $$V_{\beta, c}(e ; u=0) \text { and } V_{\beta, c}(e ; u=1).$$ Solid lines denote their minimum [TeX:] $$V_{\beta, c}(e):\left(\text { a) When } c \leq c_{L}, \text { (b) when } c_{L}<c<c_{H}\right.,$$ and (c) when [TeX:] $$c \geq c_{H}.$$ Suppose that [TeX:] $$p \leq 1 / 2.$$ Then [TeX:] $$\mathcal{T}^{K}(p)=(1-p) \mathcal{T}^{K-1}(p)+ \left(1-\mathcal{T}^{K-1}(p)\right) p$$ is a monotonically increasing function of K converging to [TeX:] $$\frac{1}{2}$$ from Remark 3.4. If [TeX:] $$e^{*}(c)<p,$$ the policy will take action u = 1 at e = p. From (20), we have [TeX:] $$\lambda+\phi_{c}(p)=c+\phi_{c}(p), \text { i.e., } \lambda=c.$$ If [TeX:] $$e^{*}(c) \geq 1 / 2,$$ the policy will take action u = 0 at [TeX:] $$e=\mathcal{T}^{k}(p)$$ for all k, since [TeX:] $$\mathcal{T}^{k}(p)<\frac{1}{2}.$$ Then we have [TeX:] $$\lambda+\phi_{c}(p)=p+\phi_{c}(\mathcal{T}(p)) \text { and thus, } \phi_{c}(\mathcal{T}(p))-\phi_{c}(p)= \lambda-p.$$ Similarly, we have [TeX:] $$\phi_{c}\left(\mathcal{T}^{k}(p)\right)-\phi_{c}\left(\mathcal{T}^{k-1}(p)\right)= \lambda-\mathcal{T}^{k-1}(p)$$ for all k. Letting [TeX:] $$\phi_{c}(p)=0,$$ we can obtain where the last equation is from (14). By dividing both sides by K and letting [TeX:] $$K \rightarrow \infty,$$ we can obtain [TeX:] $$\lambda=1 / 2$$ since [TeX:] $$\phi_{c}\left(\mathcal{T}^{K}(p)\right)$$ converges to finite [TeX:] $$\phi_{c}\left(\frac{1}{2}\right).$$ If [TeX:] $$p \leq e^{*}(c)<1 / 2,$$ then from the monotonicity of [TeX:] $$\mathcal{T}^{k}(p),$$ there exists an integer [TeX:] $$\tilde{K} \geq 0$$ such that [TeX:] $$\mathcal{T}^{\tilde{K}}(p) \leq e^{*}(c) \text { and } \mathcal{T}^{\tilde{K}+1}(p)>e^{*}(c).$$ From [TeX:] $$\mathcal{T}^{k}(p)= \frac{1-(1-2 p)^{k}+1}{2}$$ and the integer constraint, we have [TeX:] $$\tilde{K}= \left.\mid \log _{1-2 p}^{2} \frac{1-2 e^{*}(c)}{1-2 p}\right]+1.$$ In this case, we can obtain Since [TeX:] $$\mathcal{T}^{\tilde{K}+1}(p)>e^{*}(c)$$ implies [TeX:] $$\lambda+\phi_{c}\left(\mathcal{T}^{\tilde{K}+1}(p)\right)= c+\phi_{c}(p),$$ we combine it with (23). Letting [TeX:] $$\phi_{c}(p)=0,$$ we obtain (21). Now suppose that [TeX:] $$p>1 / 2.$$ Then, [TeX:] $$\mathcal{T}^{k}(p)$$ converges to [TeX:] $$\frac{1}{2}$$ as [TeX:] $$k \rightarrow \infty$$ while it alternates around 1=2 from Remark 3.4. We note that [TeX:] $$\mathcal{T}^{k}(p)<p$$ for all k. Thus, if [TeX:] $$e^{*}(c)<p,$$ the optimal policy will take action u = 1 at e = p and we have [TeX:] $$\lambda=c.$$ If [TeX:] $$e^{*}(c) \geq p,$$ the policy will take action [TeX:] $$u=0 \text { at } e=\mathcal{T}^{k}(p)$$ for all k, and we have [TeX:] $$\lambda=1 / 2$$ as in the case that [TeX:] $$p \leq 1 / 2$$ and [TeX:] $$e^{*}(c) \geq 1 / 2.$$ Remark 3.8 (MAP): Suppose that [TeX:] $$p_{01}=p_{10}=1-p.$$ Then, for the MAP estimator, the one-step evolution of the belief error e is min [TeX:] $$\{e p+(1-e)(1-p), 1-e p-(1-e)(1-p)\}= \min \{1-e(1-p)-(1-e) p, e(1-p)+(1-e) p\},$$ which is the same as the one-step evolution of e for the Markov chain with [TeX:] $$p_{01}=p_{10}=p.$$ By induction, it can be shown that k-step evolution of the belief error e for the transition probability p is the same as k-step evolution of e for 1 - p. Remark 3.9 (MAP): Note that, for the MAP estimator, the error range is [0, 0:5] since if [TeX:] $$(D, e), e>0.5$$ then the MAP estimator uses [TeX:] $$(\bar{D}, 1-e).$$ Thus, when [TeX:] $$p \leq \frac{1}{2},$$ the minimum expected average penalty [TeX:] $$\lambda_{\mathrm{MAP}}$$ for the MAP estimator is the same as the first and second cases in (21) since [TeX:] $$\mathcal{T}^{k}(e)= \min \left\{\mathcal{T}^{k}(e), 1-\mathcal{T}^{k}(e)\right\}$$ for all [TeX:] $$k \geq 0 \text { and } e \in[0,0.5].$$ When [TeX:] $$p>1 / 2,$$ from Remark 3.8, we have [TeX:] $$\lambda_{\mathrm{MAP}}$$ as in (21) using q = 1 - p instead of p. ## IV. INDEX POLICY FOR MULTIPLE SYMMETRIC SOURCES In this section, we show that the problem of single symmetric source with communication cost is indexable, and obtain the closed-form index, which can be considered as the amount of communication cost that the base station is willing to pay for the information update. For multiple sources, we can minimize the information mismatch by updating the source with the highest index first. We start from the definition of indexability. Let [TeX:] $$\mathcal{P}(c)$$ be the set of belief state e for which it is optimal to not update the state information, i.e., [TeX:] $$\mathcal{P}(c)=\left\{e \in[0,1]: u_{c}^{*}(e)=0\right\}.$$ Definition 4.1 (Indexibility [21]): The problem of single source with communication cost is indexable if [TeX:] $$\mathcal{P}(c)$$ monotonically increases from [TeX:] $$\emptyset$$ to the entire space [0, 1] as the communication cost c increases from [TeX:] $$-\infty \text { to } \infty.$$ The extended problem to multiple sources is indexable if each problem of source i is indexable for all sources i. In our problem of single symmetric source, the indexability is equivalent to the monotonic increase of threshold e*(c) for [TeX:] $$c \in\left(c_{L}, c_{H}\right].$$ The following lemma shows a sufficient condition for the indexability of our single-source problem. Lemma 4.1: Threshold e*(c) monotonically increases with respect to [TeX:] $$c \in\left(c_{L}, c_{H}\right]$$ if where [TeX:] $$\phi_{c}(e ; u=0)=e+\phi_{c}(\mathcal{T}(e)) \text { and } \phi_{c}(e ; u=1)= \ c+\phi_{c}(p).$$ Proof: We prove by contraposition. Suppose that there exists a [TeX:] $$\tilde{c} \in\left(c_{L}, c_{H}\right]$$ such that e*(c) is decreasing at [TeX:] $$c=\tilde{c} \text {. }$$ Then, there exists a [TeX:] $$\delta>0$$ such that for all [TeX:] $$\epsilon \in[0, \delta], e^{*}(\tilde{c}+\epsilon) \leq e^{*}(\tilde{c}).$$ Thus, given cost [TeX:] $$\tilde{c}+\epsilon,$$ the optimal decision at [TeX:] $$e^{*}(\tilde{c}) \text { is } u=1 \text {, i.e., }$$ Similarly, we have Combining (25) and (26), and dividing by [TeX:] $$E,$$ we have [TeX:] $$\frac{\phi_{\bar{c}+\epsilon}\left(e^{*}(\tilde{c}) ; u=1\right)-\phi_{\bar{c}}\left(e^{*}(\tilde{c}) ; u=1\right)}{\epsilon} \leq \frac{\phi_{\tilde{c}+\epsilon}\left(e^{*}(\tilde{c}) ; u=0\right)-\phi_{\bar{c}}\left(e^{*}(\tilde{c}) ; u=0\right)}{\epsilon},$$ which implies [TeX:] $$\left.\frac{\partial \phi_{c}(e ; u=1)}{\partial c}\right|_{e=e^{*}(\tilde{c})} \leq\left.\frac{\partial \phi_{c}(e ; u=0)}{\partial c}\right|_{e=e^{*}(\tilde{c})}.$$ By contraposition, the lemma holds. We now show the indexability of the single source network with communication cost using Lemma 4.1. Theorem 4.1: For a source that follows a two-state symmetric Markov process, the single source problem with communication cost is indexable. Proof: To show that [TeX:] $$\mathcal{P}(c)$$ monotonically increases in c, it suffices to show (24) for all [TeX:] $$c \in\left(c_{L}, c_{H}\right].$$ From (20), we have We are going to use a couple of intermediate results: (i) If [TeX:] $$e>e^{*}(c),$$ then the optimal policy has action u = 1, and we have [TeX:] $$\lambda+\phi_{c}(e)=c+\phi_{c}(p)$$ from (20). By taking the partial derivative with respect to c, we have (ii) If [TeX:] $$\mathcal{T}^{k}(e) \leq e^{*}(c)$$ for all [TeX:] $$k \geq 0,$$ the optimal action is u = 0 at [TeX:] $$\mathcal{T}^{k}(e).$$ Then, again from (20), we have [TeX:] $$\lambda+\phi_{c}\left(\mathcal{T}^{k-1}(e)\right)= \mathcal{T}^{k-1}(e)+\phi_{c}\left(\mathcal{T}^{k}(e)\right)$$ for all k. Letting [TeX:] $$$$ we can obtain Note that [TeX:] $$\phi_{c}\left(\mathcal{T}^{K}(e)\right) \rightarrow \phi_{c}(1 / 2)$$ as [TeX:] $$K \rightarrow \infty.$$ Dividing the equation by K and letting [TeX:] $$K \rightarrow \infty,$$ we obtain [TeX:] $$\lambda=1 / 2$$ (i.e., [TeX:] $$\lambda$$ does not depend on c), and using this, we obtain [TeX:] $$\phi_{c}(e)= \phi_{c}(1 / 2)-(1-2 e) / 4 p$$ when [TeX:] $$K \rightarrow \infty.$$ This results in We now show (24) using (27) – (30). Suppose that [TeX:] $$p \leq 1 / 2.$$ If [TeX:] $$e^{*}(c)<p,$$ then we have [TeX:] $$\mathcal{T}\left(e^{*}(c)\right)>e^{*}(c)$$ from Remark 3.4, and [TeX:] $$\lambda=c$$ from Lemma 3.4. Hence, from (29) with [TeX:] $$e=\mathcal{T}\left(e^{*}(c)\right)>e^{*}(c),$$ we have [TeX:] $$\frac{\partial \phi_{c}\left(\mathcal{T}\left(e^{*}(c)\right)\right)}{\partial c}= \frac{\partial \phi_{c}(p)}{\partial c}.$$ From this and (27), (28), we have If [TeX:] $$p \leq e^{*}(c)<1 / 2,$$ then we have [TeX:] $$\mathcal{T}\left(e^{*}(c)\right)>e^{*}(c)$$ and [TeX:] $$\lambda=\frac{1}{\bar{K}+1}\left[c+h_{p}(p, \tilde{K})\right].$$ Thus, from (29), we have where [TeX:] $$\tilde{K}=\left\lfloor\log _{1-2 p} \frac{1-2 e^{*}(c)}{1-2 p}\right]+1>0.$$ If [TeX:] $$e^{*}(c) \geq 1^{1} / 2,$$ then we have [TeX:] $$\mathcal{T}^{k}\left(e^{*}(c)\right) \leq e^{*}(c)$$ and [TeX:] $$\mathcal{T}^{k}(p) \leq e^{*}(c)$$ for all k from Remark 3.4. Thus, we have [TeX:] $$\frac{\partial \phi_{c}\left(\mathcal{T}\left(e^{*}(c)\right)\right)}{\partial c}=\frac{\partial \phi_{c}\left(\frac{1}{2}\right)}{\partial c} \text { and } \frac{\partial \phi_{c}(p)}{\partial c}=\frac{\partial \phi_{c}\left(\frac{1}{2}\right)}{\partial c}$$ from (30), and Now suppose that [TeX:] $$p>1 / 2.$$ If [TeX:] $$e^{*}(c)<1 / 2,$$ then we have [TeX:] $$\mathcal{T}\left(e^{*}(c)\right)>1 / 2$$ since [TeX:] $$\mathcal{T}(\cdot)$$ alternates around 1=2, and [TeX:] $$\lambda=c$$ from Lemma 3.4. Hence, we have [TeX:] $$\frac{\partial \phi_{c}\left(\mathcal{T}\left(e^{*}(c)\right)\right)}{\partial c}=\frac{\partial \phi_{c}(p)}{\partial c}$$ from (29), and the sufficient condition holds as in (31). If [TeX:] $$e^{*}(c) \geq p,$$ then we have [TeX:] $$e^{*}(c) \geq \mathcal{T}^{k}\left(e^{*}(c)\right)$$ and [TeX:] $$e^{*}(c) \geq \mathcal{T}^{k}(p)$$ for all k from Remark 3.4. Thus, (30) holds for all e, we have [TeX:] $$\frac{\partial \phi_{c}\left(\mathcal{T}\left(e^{*}(c)\right)\right)}{\partial c}=\frac{\partial \phi_{c}(p)}{\partial c}=\frac{\partial \phi_{c}\left(\frac{1}{2}\right)}{\partial c}$$ which satisfies the sufficient condition as in (32). If [TeX:] $$1 / 2 \leq e^{*}(c)<p,$$ then the optimal action is u = 1 at e = p, and u = 0 at e = 1=2. Thus, we have Solving the above equations, we have [TeX:] $$\lambda=c=1 / 2,$$ which implies that if c = [TeX:] $$1 / 2 .$$, then u = 1 for every time slot and u = 0 for every time slot are optimal solutions and give the time-average penalty of [TeX:] $$1 / 2 .$$ Further, it implies that if [TeX:] $$c \neq 1 / 2,$$ then the case of [TeX:] $$\left(1 / 2 \leq e^{*}(c)<p\right)$$ does not occur. Hence, we have [TeX:] $$\lambda+\phi_{c}(1 / 2)=c+\phi_{c}(p).$$ By taking partial derivative with respect to c, we obtain [TeX:] $$\frac{\partial \phi_{c}(p)}{\partial c}=\frac{\partial \phi_{c}\left(\frac{1}{2}\right)}{\partial c}.$$ Further, since [TeX:] $$\mathcal{T}^{k}\left(e^{*}(c)\right) \leq e^{*}(c)$$ for all k, we have [TeX:] $$\frac{\partial \phi_{c}\left(\mathcal{T}\left(e^{*}(c)\right)\right)}{\partial c}=\frac{\partial \phi_{c}\left(\frac{1}{2}\right)}{\partial c}$$ from (30). Thus, the sufficient condition is satisfied as in (32). Given the indexability of the problem, we compute the index as follows. Definition 4.2 (Whittle’s index [21]): Given indexability, Whittle’s index denoted by W(e) is the infimum transmission cost c that makes both actions (i.e., u = 0, 1) equally desirable at e. Theorem 4.2: Consider the error probability minimization problem with multiple symmetric Markovian sources. The closed-form expression of Whittle’s index [TeX:] $$W_{i}\left(e_{i}\right)$$ for source i is given as follows. When [TeX:] $$p^{(i)} \leq 1 / 2:$$ where [TeX:] $$\tilde{K}=\left\lfloor\log _{1-2 p^{(i)}} \frac{1-2 e_{i}}{1-2 p^{(i)}}\right\rfloor+1.$$ When [TeX:] $$p^{(i)}>1 / 2:$$ Proof: We drop i for notational convenience. From the definition of Whittle index, for given e, we find cost W(e) such that the two actions (u = 0 and 1) are indifferent under the threshold policy [TeX:] $$\pi . \text { Let } \pi(e, c)$$ denote the decision of policy [TeX:] $$\pi$$ with cost c: [TeX:] $$\pi\left(e^{\prime}, W(e)\right)=0$$ for all [TeX:] $$e^{\prime} \leq e,$$ and [TeX:] $$\pi\left(e^{\prime}, W(e)\right)=1 \text { for all } e^{\prime}>e$$ by definition. First, suppose that [TeX:] $$p \leq \frac{1}{2}.$$ For [TeX:] $$e \in[0, p),$$ we have [TeX:] $$e<\mathcal{T}(e)$$ since [TeX:] $$\mathcal{T}(\cdot)$$ monotonically increases toward [TeX:] $$1 / 2.$$ Thus, we have [TeX:] $$\pi(\mathcal{T}(e), W(e))=1,$$ which leads to [TeX:] $$\lambda+\phi_{c}(\mathcal{T}(e))= c+\phi_{c}(p)$$ from (20) with c = W(e). Similarly, we have [TeX:] $$\pi(p, W(e))=1 \text { and } \lambda+\phi_{c}(p)=c+\phi_{c}(p).$$ Also, since the policy is indifferent at e, we have [TeX:] $$e+\phi_{c}(\mathcal{T}(e))= c+\phi_{c}(p).$$ By solving, the three equations, we can obtain that [TeX:] $$c=W(e)=e.$$ For [TeX:] $$e \in[p, 1 / 2),$$ we have [TeX:] $$e<\mathcal{T}(e)<1 / 2,$$ and thus[TeX:] $$\pi(\mathcal{T}(e), W(e))=1,$$ which leads to [TeX:] $$\lambda+\phi_{c}(\mathcal{T}(e))= c+\phi_{c}(p)$$ from (20) with [TeX:] $$c=W(e) .$$ By letting [TeX:] $$\phi_{c}(p)=0$$ with [TeX:] $$c=W(e),$$ we have [TeX:] $$\lambda+\phi_{c}(\mathcal{T}(e))=c.$$ Also from (21), we have [TeX:] $$\lambda=\frac{1}{K+1}\left[c+h_{p}(p, \tilde{K})\right],$$ where [TeX:] $$\tilde{K}=\left\lfloor\log _{1-2 p} \frac{1-2 e}{1-2 p}\right\rfloor+1.$$ The indifference of the policy at e results in [TeX:] $$e+\phi_{c}(\mathcal{T}(e))=c.$$ Combining with [TeX:] $$\lambda+\phi_{c}(\mathcal{T}(e))=c,$$ we have [TeX:] $$e=\lambda$$ and thus (33). For [TeX:] $$e \geq 1 / 2 \text {, }$$ we have [TeX:] $$1 / 2 \leq \mathcal{T}(e) \leq e$$ since [TeX:] $$\mathcal{T}(\cdot)$$ monotonically decreases towards [TeX:] $$1 / 2$$ from Remark 3.4, and thus [TeX:] $$\pi(\mathcal{T}(e), W(e))=0,$$ which leads to [TeX:] $$\lambda+\phi_{c}(\mathcal{T}(e))= \mathcal{T}(e)+\phi_{c}\left(\mathcal{T}^{2}(e)\right) \text { with } c=W(e).$$ Also, from [TeX:] $$p \leq 1 / 2 \text { and } \mathcal{T}(p) \leq 1 / 2<W(e) \text {, }$$ we have [TeX:] $$\pi(\mathcal{T}(p), W(e))=0,$$ implying [TeX:] $$\lambda+\phi_{c}(p)=p+\phi_{c}(\mathcal{T}(p)) .$$ Combining these with the indifference equation [TeX:] $$e+\phi_{c}(\mathcal{T}(e))=c+\phi_{c}(p),$$ we obtain We note that [TeX:] $$\mathcal{T}^{k}(e) \leq e \text { and } \mathcal{T}^{k}(p) \leq e$$ for all [TeX:] $$k \geq 0, p \leq 1 / 2, e \geq 1 / 2.$$ Then, from [TeX:] $$\pi\left(\mathcal{T}^{k}(e), W(e)\right)=0$$ and [TeX:] $$\pi\left(\mathcal{T}^{k}(p), W(e)\right)=0,$$ we can obtain respectively. Further, from (36) and (37), we can obtain respectively. By adding [TeX:] $$e+\mathcal{T}(e)$$ to the both sides of the first equation and c + p to the both sides of the second equation, and by combining them with (35), we obtain where By letting [TeX:] $$K \rightarrow \infty,$$ we have [TeX:] $$\phi_{c}\left(\mathcal{T}^{K+1}\left(e^{\prime}\right)\right) \rightarrow \phi_{c}\left(\frac{1}{2}\right)$$ for any [TeX:] $$e^{\prime}.$$ From [TeX:] $$p \in[0,1] \text {, }$$ we can obtain [TeX:] $$c=W(e)=e / 2 p.$$ Now we consider the other case of [TeX:] $$p>1 / 2.$$ For [TeX:] $$e \in[0,1 / 2) \text { and } e \in[p, 1] \text {, }$$ we can follow the same line of the analysis as the case [TeX:] $$p \leq 1 / 2$$ to obtain (34). For [TeX:] $$e \in[1 / 2, p),$$ we have [TeX:] $$\pi(p, W(e))=1,$$ which leads to [TeX:] $$\lambda+\phi_{c}(p)=c+\phi_{c}(p) \text {, i.e., } \lambda=c,$$ from (20). Also since [TeX:] $$\mathcal{T}(e)<e,$$ we have [TeX:] $$\pi(\mathcal{T}(e), W(e))=0,$$ and thus [TeX:] $$\lambda+\phi_{c}(e)=e+\phi_{c}(\mathcal{T}(e)).$$ Letting [TeX:] $$\phi_{c}(e)=0,$$ and combining two equations, we have [TeX:] $$\phi_{c}(\mathcal{T}(e))=c-e.$$ We generalize this by using [TeX:] $$\mathcal{T}^{k}(e)<e$$ for all k. From [TeX:] $$\pi\left(\mathcal{T}^{k}(e), W(e)\right)=0,$$ we have [TeX:] $$\lambda+\phi_{c}\left(\mathcal{T}^{k}(e)\right)= \mathcal{T}^{k}(e)+\phi_{c}\left(\mathcal{T}^{k+1}(e)\right)$$ for all k with c = W(e). Then using [TeX:] $$\lambda=c$$ and from (38), From [TeX:] $$\phi_{c}\left(\mathcal{T}^{K}(e)\right) \rightarrow \phi_{c}(1 / 2) \text { as } K \rightarrow \infty,$$ we can obtain [TeX:] $$c=1 / 2$$ by dividing both sides by K and letting [TeX:] $$K \rightarrow \infty.$$ Therefore, we can assign an index to each symmetric source i as in (33) or (34), depending on the value of p. Since a source with a higher index implies that the threshold policy is willing to take a higher communication cost for an update than the other sources, we can develop an index policy that updates the source with the highest index to minimize the long-run average error sum over all the sources. Remark 4.1 (MAP): From Remarks 3.8 and 3.9, the indexiablity for the MAP estimator can be shown. Further, for [TeX:] $$p \leq 0.5 \text {, }$$ the Whittle’s index for the MAP estimator is the same as the first and second cases in (33). For [TeX:] $$p>0.5,$$ the Whittle’s index is the same as (33) by replacing [TeX:] $$q=1-p$$ instead of p. ## V. SIMULATION RESULTS In this section, we show the performance of Whittle’s index policy through simulation results. We first compare five policies: dynamic programming (DP) over finite time horizon (for UAH and MAP estimators), Whittle’s index policy (for Fig. 3. Average error performance with two sources: [TeX:] $$\text { (a) } p^{(1)}=0.1 \text { and } p^{(2)}=\alpha \text { and (b) } p^{(1)}=0.5 \text { and } p^{(2)}=\alpha \text {. }$$ UAH and MAP estimators), and myopic policy. Recall that the myopic policy updates source [TeX:] $$i=\arg \max _{i} e_{i}(t)$$ at each time t. Although the DP is an optimal policy, its high computational complexity prohibits its practical implementation [30]. Thus, in this simulation, we consider two symmetric sources over 1000 time-slots. In particular, the transition probability [TeX:] $$p^{(1)}$$ of source 1 is fixed at 0:1 or 0:5 while the transition probability [TeX:] $$p^{(2)}$$ of source 2 varies from 0:1 to 0:9. We recall that the communication channel is a noiseless channel and has no transmission delay. Fig. 3 shows the average error of five policies with two symmetric sources when [TeX:] $$p^{(1)}=0.1$$ (Fig. 3(a)) or [TeX:] $$p^{(1)}=0.5$$ (Fig. 3(b)) while [TeX:] $$p^{(2)}=\alpha$$ varying from 0.1 to 0.9. The results are averaged over [TeX:] $$10^{4}$$ repetitions. As seen in Fig. 3, the Whittle’s index policy achieves near-optimal performance. Further, since the evolution of estimated error under MAP estimator for Markov chains with transition probabilities of p and 1 - p is the same, the performance of MAP estimator is symmetric around [TeX:] $$\alpha=0.5.$$ As seen in Fig. 3(a), when [TeX:] $$p^{(2)}=\alpha \geq 0.5,$$ the belief error of source 1 is no larger than 0.5. Thus, the myopic policy always choose source 2, and the average error (per source) becomes 0.25. The same holds for [TeX:] $$\alpha \leq 0.5$$ in Fig. 3(b). We observed that the performance gap between the Whittle’s index policies and the dynamic programming is insignificant, which confirms to the previous results of the optimality of the Whittle’s index policy with many sources/arms [10]. Thus, we conjecture that the Whittle’s index policy proposed in this paper performs very close to the optimal policy (DP). Next, we compare the performance of Whittle’s index policy and the myopic policy with a larger number of symmetric sources. The number of sources changes from 1 to 10, and the transition probabilities [TeX:] $$p^{(i)} \in(0,1)$$ are chosen uniformly at random. The total time-slot is [TeX:] $$10^{5}$$ and the result is averaged over 100 repetitions. As seen in Fig. 4, Whittle’s index policy outperforms the myopic policy for all the cases. ## VI. CONCLUSION In this paper, we considered a status update system with multiple sources updating a common remote estimation and investigated the centralized scheduling policies to minimize the expected average estimation error over infinite time horizon. We showed that an optimal solution to the problem of single source with communication cost is of threshold type. Fig. 4. Average error performance with multiple sources. For multi-source scenario, we modeled the error minimization problem as a restless multi-armed bandits problem and showed that our problem with symmetric sources is indexable. Under indexability, we developed low-complexity algorithm by obtaining a closed-form expression of the Whittle’s index. Through numerical simulations, we showed that our algorithm is near-optimal by comparing finite-time dynamic programming. ## References - 1 J. Shi, J. Wan, H. Yan, H. Suo, "A survey of cyber-physical systems," in
*Proc. IEEE WCSP*, 2011;custom:[[[-]]] - 2 A. Kosta, N. Pappas, V. Angelakis, "Age of information: A new concept, metric, and tool,"
*Found. Trends Netw.*, vol. 12, no. 3, pp. 162259-162259, 2017.doi:[[[10.1561/1300000060]]] - 3 Y. Sun, B. Cyr, "Sampling for data freshness optimization: Nonlinear age functions,"
*J. Commun. Netw.*, vol. 21, no. 3, 2019.custom:[[[-]]] - 4 Y. Sun, E. Uysal-Biyikoglu, R. D. Yates, C. E. Koksal, N. B. Shroff, "Update or wait: How to keep your data fresh,"
*IEEE Trans. Inf. Theory*, vol. 63, no. 11, pp. 7492-7508, 2017.doi:[[[10.1109/TIT.2017.2735804]]] - 5 R. Talak, S. Karaman, E. Modiano, "Optimizing information freshness in wireless networks under general interference constraints,"
*IEEE /ACM Trans. Netw.*, vol. 28, no. 1, pp. 15-28, 2019.doi:[[[10.1109/tnet.2019.2946481]]] - 6 C. Joo, A. Eryilmaz, "Wireless scheduling for information freshness and synchrony: Drift-based design and heavy-traffic analysis," in
*IEEE /ACM Trans. Netw.*, 2018;vol. 26, no. 6, pp. 2556-2568. custom:[[[-]]] - 7 G. Stamatakis, N. Pappas, A. Traganitis, "Control of status updates for energy harvesting devices that monitor processes with alarms," in
*Proc. IEEE GLOBECOM*, 2019;custom:[[[-]]] - 8 Y.-P. Hsu, "Age of information: Whittle index for scheduling stochastic arrivals," in
*Proc. IEEE ISIT*, 2018;custom:[[[-]]] - 9 I. Kadota, A. Sinha, E. Uysal-Biyikoglu, R. Singh, E. Modiano, "Scheduling policies for minimizing age of information in broadcast wireless networks,"
*IEEE /ACM Trans. Netw.*, vol. 26, no. 6, pp. 26372650-26372650, 2018.doi:[[[10.1109/tnet.2018.2873606]]] - 10 A. Maatouk, S. Kriouile, M. Assaad, A. Ephremides, "On the optimality of the whittle’s index policy for minimizing the age of information,"
*IEEE Trans. Wireless Commun.*, vol. 26, no. 2, pp. 12631277-12631277, 2020.doi:[[[10.1109/twc.2020.3032237]]] - 11 C. Kam, S. Kompella, G. D. Nguyen, J. E. Wieselthier, A. Ephremides, "Towards an effective age of information: Remote estimation of a markov source," in
*Proc. IEEE INFOCOM*, 2018;custom:[[[-]]] - 12 Z. Jiang, S. Zhou, Z. Niu, C. Yu, "A unified sampling and scheduling approach for status update in multiaccess wireless networks," in
*Proc. IEEE INFOCOM*, 2019;custom:[[[-]]] - 13 G. Chen, S. C. Liew, Y. Shao, "Uncertainty-of-information scheduling: A restless multi-armed bandit framework,"
*arXiv preprint arXiv:2102.06384*, 2021.custom:[[[-]]] - 14 Y. Sun, Y. Polyanskiy, E. Uysal-Biyikoglu, "Remote estimation of the wiener process over a channel with random delay," in
*Proc. IEEE ISIT*, 2017;custom:[[[-]]] - 15 T. Z. Ornee, Y. Sun, "Sampling for remote estimation through queues: Age of information and beyond," in
*Proc. IEEE WiOpt*, 2019;custom:[[[-]]] - 16 X. Gao, E. Akyol, T. Bas ¸ar, "Optimal sensor scheduling and remote estimation over an additive noise channel," in
*Proc. IEEE ACC*, 2015;custom:[[[-]]] - 17 X. Gao, E. Akyol, T. Ba´ sar, "On remote estimation with multiple communication channels," in
*Proc. IEEE ACC*, 2016;custom:[[[-]]] - 18 J. Yun, C. Joo, A. Eryilmaz, "Optimal real-time monitoring of an information source under communication costs," in
*Proc. IEEE CDC*, 2018;custom:[[[-]]] - 19 J. Gittins, K. Glazebrook, R. Weber,
*Multi-armed bandit allocation indices*, John Wiley & Sons, 2011.custom:[[[-]]] - 20 C. H. Papadimitriou, J. N. Tsitsiklis, "The complexity of optimal queuing network control,"
*Math. Operations Research*, vol. 24, no. 2, pp. 293-305, 1999.doi:[[[10.1287/moor.24.2.293]]] - 21 P. Whittle, "Restless bandits: Activity allocation in a changing world,"
*J. Appl. Probability*, vol. 25, no. A, pp. 287-298, 1988.doi:[[[10.2307/3214163]]] - 22 N. Akbarzadeh, A. Mahajan, "Restless bandits with controlled restarts: Indexability and computation of whittle index," in
*Proc. IEEE CDC*, 2019;custom:[[[-]]] - 23 R. R. Weber, G. Weiss, "On an index policy for restless bandits,"
*J. Appl. Probability*, vol. 27, no. 3, pp. 637-648, 1990.doi:[[[10.2307/3214547]]] - 24 K. Liu, R. Weber, Q. Zhao, "Indexability and whittle index for restless bandit problems involving reset processes," in
*Proc. IEEE CDE*, 2011;custom:[[[-]]] - 25 P. Ansell, K. D. Glazebrook, J. Ni˜ no-Mora, M. O’Keeffe, "Whittle’s index policy for a multi-class queueing system with convex holding costs,"
*Mathematical Methods of Operations Research*, vol. 57, no. 1, pp. 21-39, 2003.custom:[[[-]]] - 26 M. Larranaga, U. Ayesta, I. Verloop, "Stochastic and fluid index policies for resource allocation problems," in
*Proc. IEEE INFOCOM*, 2015;custom:[[[-]]] - 27 R. Meshram, D. Manjunath, A. Gopalan, "On the whittle index for restless multiarmed hidden markov bandits,"
*IEEE Trans. Automat. Control*, vol. 63, no. 9, pp. 3046-3053, 2018.doi:[[[10.1109/TAC.2018.2799521]]] - 28 K. Liu, Q. Zhao, "Indexability of restless bandit problems and optimality of whittle index for dynamic multichannel access,"
*IEEE Trans. Inf. Theory*, vol. 56, no. 11, pp. 5547-5567, 2010.doi:[[[10.1109/TIT.2010.2068950]]] - 29 T. He, A. Anandkumar, D. Agrawal, "Index-based sampling policies for tracking dynamic networks under sampling constraints," in
*Proc. IEEE INFOCOM*, 2011;custom:[[[-]]] - 30 D. P. Bertsekas,
*Dynamic programming and optimal control*, Athena scientific Belmont, MA no.3, vol. 1, no. 3, 2005.custom:[[[-]]] - 31 E. J. Sondik, "The optimal control of partially observable markov processes over the infinite horizon: Discounted costs,"
*Operations research*, vol. 26, no. 2, pp. 282-304, 1978.doi:[[[10.1287/opre.26.2.282]]] - 32 I. Kadota, A. Sinha, E. Modiano, "Optimizing age of information in wireless networks with throughput constraints," in
*Proc. IEEE INFOCOM*, 2018;custom:[[[-]]] - 33 D. P. Bertsekas,
*Dynamic Programming and Optimal Control*, Vol. II 3rd ed. Athena Scientific, 2007.custom:[[[-]]] - 34 O. L. d. V. Costa, F. Dufour, "The vanishing discount approach for the average continuous control of piecewise deterministic markov processes,"
*J. Appli. Probability*, vol. 46, no. 4, pp. 1157-1183, 2009.doi:[[[10.1239/jap/1261670695]]] - 35 J. R. Norris,
*Markov chains*, Cambridge university press, no.2, no. 2, 1998.custom:[[[-]]] - 36 P. K. Dutta, "What do discounted optima converge to?: A theory of discount rate asymptotics in economic models,"
*J. Econ. Theory*, vol. 55, no. 1, pp. 64-94, 1991.doi:[[[10.1016/0022-0531(91)90059-d]]] |