Perceptual models can be used to predict speech intelligibility and listening effort benefits. To achieve meaningful predictions of clinical study results, simulations need to take into account real hearing aid hardware, individual hearing loss, acoustic coupling, and room characteristics of the study site. We used this approach to compare speech reception threshold (SRT) benefits from clinical studies to predictions of the intelligibility weighted signal-to-noise ratio (iSNR), SII (Speech Intelligibility Index) and HASPIv2 model (Hearing Aid Speech Perception Index). To predict the listening effort (LE), we used the LEAP model (Listening Effort prediction from Acoustic Parameters). The acceptance criteria was set for the prediction error to lie within the standard deviation of the corresponding test. The iSNR and SII models underestimated the unaided SRTs and overestimated the aided SRTs, thus overestimating the aided benefit. HASPI correctly predicted the unaided and aided SRTs and the aided benefit was statistically indifferent from the study data. LE predictions also fell within the acceptance criteria. Using realistic setups and clearly defined criteria, models can be evaluated for their value in predicting hearing aid benefit and thus can be used to support the development of hearing aid features.