This paper presents a theoretical framework for modeling human visual attention. The framework’s core claim is that three mechanisms drive attention: selection, which picks out an item for further processing; engagement, which tags a selected item as relevant or irrelevant to the current task; and enhancement, which increases sensitivity to task-relevant items and decreases sensitivity to task-irrelevant items. Building on these mechanisms, the framework is able to explain human performance on attentionally demanding tasks like visual search and multiple object tracking, and it supports a broad range of predictions about the interactions between such tasks.