To evaluate the reliability of binocular vision measurements used in the classification of convergence insufficiency.
Two examiners tested 20 fifth and sixth graders in a school setting who passed a screening of visual acuity, refraction, and binocularity. The tests, conducted using a standard protocol, consisted of von Graefe near heterophoria (NH), phorometric positive fusional vergence (PFV), nearpoint of convergence (NPC), and monocular pushup accommodative amplitude (AA). Each examiner measured each child three consecutive times for each test, on two separate occasions, spaced approximately 1 week apart. Intraexaminer and interexaminer agreement was assessed using intraclass correlation coefficients (ICC), the median absolute difference (MAD), and the coefficient of repeatability (COR).
The within-session reliability of the NH (ICC: 0.95 to 0.99), NPC (ICC: 0.94 to 0.98), and AA (ICC: 0.88 to 0.95) were good, whereas the PFV was less reliable (ICC: 0.71 to 0.94). The intraexaminer reliability between sessions was good for the NPC (ICC: 0.92 and 0.89), less reliable for NH (ICC: 0.81 and 0.81) and AA (ICC: 0.89 and 0.69), and much less reliable for PFV break (ICC: 0.59 and 0.53). Typical between-session PFV differences (MAD) were between 3 and 4 Δ, whereas the COR differences were as large as 12 Δ.
Three of the four measures (NH, NPC, and AA) often used in the classification of convergence insufficiency generally have good within-session and between-session reliability. The PFV break was found to have only fair reliability with clinically significant differences between sessions. The large potential test-retest differences found could complicate clinical decision-making in regards to diagnosis and treatment.