Tracking with Cost Constraints

This post came about after a discussion with Odysseas Kanavetas during his Seminar++ lecture at CWI, and a further discussion with Vincent van der Goes.

The setting is as follows. We have $K$ actions with associated positive costs $c_1, \ldots, c_K$. Furthermore, we fix a per-round cost budget $C$. Each round, we are given a target weight distribution $\boldsymbol w_n \in \triangle_K$ that is affordable, in the sense that \begin{align} \label{eq:w.feasible} \boldsymbol c^\intercal\boldsymbol w_n ~\le~ C.\end{align} It is our job to sequentially pick actions $k_1, k_2, \ldots$ deterministically from $[K]$. We require the chosen action sequence to be feasible, meaning that at any time $n$ the accumulated cost remains within the cost budget, i.e.

\begin{equation}\label{eq:feasible} \sum_{i=1}^n c_{k_i} ~\le~ n C, \end{equation} On top of that, we desire our action counts to track the cumulative target $\sum_{i=1}^n \boldsymbol w_i$, i.e. \begin{equation}\label{eq:tracking} \sum_{i=1}^n \boldsymbol e_{k_i} ~\approx~ \sum_{i=1}^n \boldsymbol w_i . \end{equation}

Algorithm

Since we do assume that the costs $\boldsymbol c$ are fixed over rounds, there is a globally cheapest action $k^* = \operatorname*{arg\,min}_k c_k$ (we can break ties here arbitrarily). After $n$ rounds, let us define two vectors: we denote by $\boldsymbol N(n) = \sum_{i=1}^n \boldsymbol e_{k_i}$ the cumulative action counts, and by $\boldsymbol W(n) = \sum_{i=1}^n \boldsymbol w_i$ the cumulative target. We say that action $k$ is open for round $n$ if \[N_k(n-1)+1 ~\le~ W_k(n)\] In words, an action is open if it remains below its target if we select it. Observe that whether $k$ is open for round $n$ does not depend on the action $k_n$ selected in round $n$. It only depends on $W_k(n)$ and $N_k(n-1)$, which are known beforehand. With that, we can define our algorithm:

Algorithm In round $n$, if there are open actions for round $n$, select any. Otherwise select the cost minimiser $k^*$.

Analysis

Here are the two prongs of our analysis. By construction our algorithm maintains the invariant \begin{equation}\label{eq:invariant} N_k(n) ~\le~ W_k(n) \qquad \text{for all $k \neq k^*$}. \end{equation} and hence in particular we respect the cumulative cost constraint $\eqref{eq:feasible}$, because \begin{align*} \sum_k N_k(n) c_k &~=~ n c_{k^*} + \sum_{k \neq k^*} N_k(n) (c_k - c_{k^*}) \\ &~\stackrel{\eqref{eq:invariant}}{\le}~ n c_{k^*} + \sum_{k \neq k^*} W_k(n) (c_k - c_{k^*}) \\ &~=~ \sum_{k} W_k(n) c_k \\ &~\stackrel{\eqref{eq:w.feasible}}{\le}~ n C \end{align*} To show we track closely, making precise $\eqref{eq:tracking}$, we follow (Garivier and Kaufmann 2016, Lemma 15). Let us constrain $N_{k^*}(n)$ for the critical rounds where that count increases, i.e. $k_n = k^*$. In such a round all other actions $k \neq k^*$ are closed, i.e. $N_k(n-1)+1 ~>~ W_k(n)$. We conclude \begin{align*} N_{k^*}(n) &~=~ n - \sum_{k \neq k^*} N_{k}(n) \\ &~=~ n - \sum_{k \neq k^*} (N_{k}(n-1)+1) + K-1 \\ &~<~ n - \sum_{k \neq k^*} W_{k}(n) + K-1 \\ &~=~ W_{k^*}(n) + K-1 \end{align*} Now $k^*$ being at most $K-1$ ahead of its target, means that the other actions can be at most $K-1$ behind their targets. We conclude \[N_k(n) ~\in~ \begin{cases} \left[ W_k, W_k + K-1\right] & \text{if $k = k^*$} \\ \left[ W_k - (K-1), W_k\right] & \text{o.w.} \end{cases}\] A slightly weaker but more consise statement is that for all $n$ \[\left\|\boldsymbol N(n) - \boldsymbol W(n)\right\|_\infty ~\le~ K-1 .\]

Conclusion

We looked at tracking with costs, and showed that one can stay within a cost budget, while tracking any feasible weights at constant additive error.

Let us conclude with a question. Our tracking analysis extends the argument of (Garivier and Kaufmann 2016, Lemma 15) to the case with costs. A tighter analysis (replacing proximity of order $K$ by order $\ln K$) was recently proposed in (Degenne, Shao, and Koolen 2020, Theorem 6). Would that also have an upgrade with costs?

Degenne, Rémy, Han Shao, and Wouter M. Koolen. 2020. “Structure Adaptive Algorithms for Stochastic Bandits.” In Proceedings of the 37th International Conference on Machine Learning (ICML). http://proceedings.mlr.press/v119/degenne20b/degenne20b-supp.pdf.

Garivier, A., and E. Kaufmann. 2016. “Optimal Best Arm Identification with Fixed Confidence.” In Proceedings of the 29th Conference on Learning Theory (COLT).

Introduction

Algorithm

Analysis

Conclusion