To assess the value of predictive scores in the diagnosis of acute appendicitis. Multicentre evaluation with a prospective database. 1254 patients with acute abdominal pain. 6 departments of surgery, Germany. To measure the performance of 10 scores on one database using standardised criteria and to compare the results with published data. The ability of a score to fulfill standardised criteria: an initial negative appendicectomy rate of 15% or less, a potential perforation rate of 35% or less, an initial missed perforation rate of 15% or less, and a missed appendicitis rate of 5% or less. Reevaluation of the published data showed that the Alvarado score fulfilled all four criteria and the Lindberg, the Fenyö and the Christian scores fulfilled two criteria each. If applied to our database (acute abdominal pain, suspected appendicitis), none of the scores fulfilled any of the given criteria, even if the cut-off point was varied systematically. There were significant differences among the scores. The original published data seemed to comply with our standardised criteria but evaluation of the scores on our database resulted in poor performances for all of them. Published data seem to be optimistically biased whereas our evaluation gives more realistic estimates of the routine performance in different clinical environments. Further well designed large scale trials are needed to investigate the clinical benefit of diagnostic scoring in acute appendicitis.