For example, in our Facebook study, we want to know both.

The more uniform your measurement, the higher reliability will be.

An intraclass correlation (ICC) can be a useful estimate of inter-rater reliability on quantitative data because it is highly flexible. This is where ICC comes in (note that if you have qualitative data, e.g. Unfortunately, this flexibility makes ICC a little more complicated than many estimators of reliability.

A Pearson correlation can be a valid estimator of interrater reliability, but only when you have meaningful pairings between two and only two raters. While you can often just throw items into SPSS to compute a coefficient alpha on a scale measure, there are several additional questions one must ask when computing an ICC, and one restriction.

two effects) and 2) assumes both are drawn randomly from larger populations (i.e. If your answer to Question 1 is yes and your answer to Question 2 is “population”, you need ICC(3).

In SPSS, this is called “Two-Way Mixed.” This ICC makes the same assumptions as ICC(2), but instead of treating rater effects as random, it treats them as fixed.

Or in other words, while a particular rater might rate Ratee 1 high and Ratee 2 low, it should all even out across many raters.

Like ICC(1), it assumes a random effects model for raters, but it explicitly models this effect – you can sort of think of it like "controlling for rater effects" when producing an estimate of reliability.The restriction is straightforward: you must have the same number of ratings for every case rated.The questions are more complicated, and their answers are based upon how you identified your raters, and what you ultimately want to do with your reliability estimate.For example, in one of my lab's current studies, we are collecting copies of Facebook profiles from research participants, after which a team of lab assistants looks them over and makes ratings based upon their content. Because the research assistants are creating the data, their ratings are my scale – not the original data.Which means they 1) make mistakes and 2) vary in their ability to make those ratings.