HOW TO MAXIMIZE REWARD RATE ON TWO VARIABLE‐INTERVAL PARADIGMS

Abstract
Without assuming any constraints on behavior, we derive the policy that maximizes overall reward rate on two variable-interval paradigms. The first paradigm is concurrent variable time-variable time with changeover delay. It is shown that for nearly all parameter values, a switch to the schedule with the longer interval should be followed immediately by a switch back to the schedule with the shorter interval. The matching law does not hold at the optimum and does not uniquely specify the obtained reward rate. The second paradigm is discrete trial concurrent variable interval-variable interval. For given schedule parameters, the optimal policy involves a cycle of a fixed number of choices of the schedule with the shorter interval followed by one choice of the schedule with the longer interval. Molecular maximization sometimes results in optimal behavior.