Wednesday, December 12, 2007

Where Do Tests Come From?


I'm writing this post in followup to my previous post "How This Psychiatrist Thinks About Psychological Tests". In that post I wrote about the different types of psychological tests and why psychiatrists and psychologists use them. In this post I'm going to talk about how psychological tests get invented.

It's always something I thought would be a great gig to have: invent a psychological test, get a copyright, make sure it's good for something, then set up nationwide seminars to train and certify people to use it and sell the test to them. Talk about a self-made entrepreneur!

But there's a reason why everybody isn't doing this. It's because inventing a test---I mean one that is actually meaningful and useful---is actually quite hard to do. Drug companies spend loads of money inventing new drugs only to have them go down in flames during the clinical trials; the same thing happens with psychological tests.

To illustrate the process, let's imagine that we are going to invent a test that would be useful to the blog. We want a tool that will measure the degree to which a post (or blogger, or podcast guest) will entertain a reader or listener. Let's call it the Shrink Rap Silliness Inventory (SRSI).

The first thing you do is scour the literature looking for existing tests that are supposed to do what you want. In our case, there is nothing out there already in use that measures silliness. If we found such a test we'd look at the research behind the test to see what we presently know about the silliness measuring business. This literature review might tell us that there are various characteristics that are indicators of silliness: a tendency to wear big floppy shoes, to talk in a funny voice, to be a Monty Python fan, or to be named Roy (sorry Roy, couldn't resist). We'd use this information to put together the items used in the SRSI. The items might be questions that the subject/patient has to answer (eg. "Is your name Roy?") or observations that the test administrator makes (eg. "On a scale of 1 to 7, how big and floppy are this subject's shoes?"). Once you have a series of experimental test items put together, you're ready to start taking your SRSI for a test run (pardon the pun) to see how well it works.

The first thing you have to figure out is whether or not the test actually measures what you want it to measure---this is known as validity. We want the SRSI to measure silliness when it's present and to rule out subjects who aren't silly. In order to do this you have to give your test to groups of people known to be silly and others who aren't, and compare their scores. If SRSI scores are high for known silly folks (say, students at the local clown college or improv group) and low for non-silly folks (maybe your local newscasters) then this suggests your test is valid because it can distinguish between groups. This is analogous to using a medical laboratory test to distinguish between diseased and healthy people. There are other ways of proving test validity, but this is the usual starting point.

The second thing you have to prove is test reliability. In other words, that you can trust the test to measure things stably over time. We want the SRSI to work every time, like a car that will start in cold weather. You check for this by giving the test repeatedly to the same person or group of people over time and comparing their scores. Since we know silliness is always consistent, we want SRSI scores to be stable too---this is known as test-retest reliability. We also want lots of people to be able to use the test and have it work well for all of them. So we give the SRSI to a lot of people and have them each rate the same subject. If the SRSI scores all turn out the same we know our test has good inter-rater reliability.

Finally, you want to know how likely it is that your test score is going to be wrong. There are two ways a test score could be wrong: if a silly person gets a low SRSI score that would be an error known as a false negative test; if a non-silly person gets a high score that's a false positive test. We would have to look at our test data and figure out the percentage of times the SRSI gets a wrong score, either false positive or false negative.

This is just a portion of the research that has to go into inventing a good psychological or medical test. If we manage to jump through all these hurdles then you'd go on to do research to see if the test actually gives us useful information----if podcast guests with high SRSI scores give us better iTunes ratings and downloads, or higher visitor counts on days when they guest blog. We could even have SRSI scores for each of us Shrink Rappers! But I guess that comes back to my original issue with psychological tests---I don't need a test to tell me that Roy would be silliest.

*****************************************

(Alright, you have to admit that inkblot looks like a pelvis. I can't be the only one seeing body parts here.)

21 comments:

PG said...

Had to take a break from my thesis (where I am creating/trying to validate two scales among differing groups) to concur on the process. The validation process can take years and years. Between classical (CFA) and modern (IRT) test theory approaches, you can spend a lifetime validating and defending a scale.

FooFoo5 said...

Pelvis it is.

Gerbil said...

The first thing you have to figure out is whether or not the test actually measures what you want it to measure---this is known as validity.

Actually, this is a specific type of validity (construct, to be precise). You could also examine the face validity of your test by figuring out whether your subjects know what you are trying to measure.

And then there are so many types of reliability to consider... test-retest and interrater among them. There's also internal consistency (how well the items "hang together" to measure the same construct).

And then one would be wise to conduct additional validation studies with other populations. Does the SRSI identify the silliness of Yugos? Turducken? Speedos?

And then, if you're not too tired and you really want to get a nerdy Gerbil excited, you can give the SRSI to a huge (N > 300) sample and submit the data to a taxometric analysis, to see if there is a silliness taxon (a category of people whose silliness is mathematically distinct from everyone else's) within the general population. (This, btw, was my dissertation--albeit with deliberate self-injury and not silliness.)

PS: The pregnant lady did not see a pelvis. I saw a carnival mask instead. Huh.

Gerbil said...
This comment has been removed by the author.
Alison Cummins said...

A scarab for me!

And don't inkblot tests have poor validity?

NeoNurseChic said...

To me it looks like some sort of scary clown face yelling at me.... What does that say about my issues? LOL I am afraid of clowns and of people yelling at me, after all! haha!

No really - when I first looked at it, I saw a pelvis, too...

HP said...

..and that's only the start of the validation process.

No pelvis here. Lady at the races in her fancy hat. Hmm.

Gerbil said...

Alison--it depends on whom you ask and what kind of validity you're talking about. The Rorschach has terrible face validity, which isn't actually a bad thing (as people can't fake good or bad if they don't know what the "expected" response is). But as for construct validity or external validity... yeah, it depends on whom you ask and how you phrase it.

The inter-rater reliability of the Rorschach is pretty good, but the test-retest reliability is weird. There are some data that are considered situational and that, appropriately, have poor test-retest reliability; and some data that are considered stable traits and therefore have high test-retest reliability.

But then again they say you can only take the Rorschach once, because (unless you have some kind of neurological damage) the blots won't be novel the second time around, and therefore the meaning of its test-retest reliability is suspect.

Roy said...

OK, I'll play. I see two fox heads looking at each other and playing pattycake with their feet. While I do not see a pelvis, I am surprised that I missed an opportunity to do so.

Emy L. Nosti said...
This comment has been removed by the author.
Anonymous said...

I see two toupee-wearing eels or seahorses with trigger-tongues, two wolves with feet touching, hair extensions for skunks, a pliers/wire cutter seen at an angle, a mining drill, a yeti crab, pincers, an octopus or octopus-shaped potted plant, an arm and hand reaching down, Medusa, a funny frog or an ugly bird with a strange beak, a tornado personified, annnnd....two black rat-chicks with stalked eyes. Sorry, no pelvis. Perhaps a birth canal.

Ladyk73 said...

"test-retest reliability"

Oh don't forget that previous testing can be a threat to internal validity! And perhaps Roy is actually so silly that he becomes a Statistical Regression Threat...

Okay, you guys are killing me, I just took this exam!

I think I see Evil eating love in the weird ass seahorse fox dancing picture.

Anonymous said...

Roy silly? I would think big hair would be a better indictor of silliness and I'm not going near that inkblot thing. It needs color.

Anonymous said...

moustaches are silly too. another silly thing is enlarging font size to make it look like you wrote more that one line. teachers know this trick.

Midwife with a Knife said...

No, the inkblot is totally a pelvis. I think it's a male pelvis, because it's kinda narrow, but it could also be an android female pelvis. Either way a set up for CPD.

Anonymous said...

I've been a psychologist for 35 years. I have worked in a community mental health clinic and in private practice. The last time I used the Rorschach was when I was in grad. school. And outside of intelligence testing, which I did for a while as a school consultant, I haven't done any testing at all. I'm not atypical.Could you maybe have a skewed notion of what psychologists do? Kind of like some psychologists have a psychiatrists? Just wondering.

Anonymous said...

Man with a beard

Anonymous said...

I saw a crying horse.
Weird.

Assrot said...

The inkblot looks like a pair of Sea Horses kissing to me. Do Sea Horses kiss? Arent they asexual?

Anonymous said...

I first thought the inkblot was a pelvis, but then I thought, "Is it supposed to be two scary fighting dragons who are going to burn down Seaview, Kansas?" P.S. I think enimy *may* be crazy, but who am I to judge?

Sarebear said...

I saw a female pelvis, with an alien monster there (around it, within, sort of in the same space at the same time, and not).

Oh, hey, once upon a year, I was noodling around the net, and ran on to Tickle.com.

They have an ink-blot test, but not to diagnose mental illness, just to tell various traits among a normal population. It looked free, so I took it. It was very interesting!

At the end they tell you one or two leading things, basically, you then have to pay a fee to get your "results". Which, since I was bored, and overspending, I did.

I've got them around here someplace, I'll have to dig em' up.

One thing that I like was it said in various psychological language that in the core of me I'm a kind person. Or something.

Now, not that I need a test to tell me what I am, but the way it talked about it was very me, the things it said, that surprised me at first and then I realized they surprised me because I don't really show alot to people, so the insight was surprising from an external source.

It says the test is just for fun, and NOT diagnoses.\

I'm curious how you three shrinks would turn out!

I'll dig mine up and post a bit of what it told me, maybe.

I did feel sort of like I'd been "had", though, when I got to the end and they wanted money. That's ok, go take the free test, see what wierd things you see in the blots.