A novel rater agreement methodology for language transcriptions: evidence from a nonhuman speaker
Quality & Quantity
The ability to measure agreement between two independent observers is vital to any observational study. We use a unique situation, the calculation of inter-rater reliability for transcriptions of a parrot’s speech, to present a novel method of dealing with inter-rater reliability which we believe can be applied to situations in which speech from human subjects may be difficult to transcribe. Challenges encountered included (1) a sparse original agreement matrix which yielded an omnibus measure of inter-rater reliability, (2) “lopsided” 2×2 matrices (i.e. subsets) from the overall matrix and (3) categories used by the transcribers which could not be pre-determined. Our novel approach involved calculating reliability on two levels—that of the corpus and that of the above mentioned smaller subsets of data. Specifically, the technique included the “reverse engineering” of categories, the use of a “null” category when one rater observed a behavior and the other did not, and the use of Fisher’s Exact Test to calculate r -equivalent for the smaller paired subset comparisons. We hope this technique will be useful to those working in similar situations where speech may be difficult to transcribe, such as with small children.
Kaufman, A B, E N. Colbert-White, and R Rosenthal. "A Novel Rater Agreement Methodology for Language Transcriptions: Evidence from a Nonhuman Speaker." Quality and Quantity. 48.4 (2014): 2329-2339. Print.