IQ discourse is increasingly unhinged
Myth-busting becomes necessary
Erik Hoel is a neuroscientist and writer.
ICYMI: Bucks for Science Blogs: Announcing the Subscription Revenue Sharing Program
If you were to take a person blissfully offline for the last few years, and have them log on to a major social media platform (especially the one formerly known as Twitter), their initial reaction would likely include: Whoa, that’s a lot of IQ talk! Even the most popular meme formats are now about IQ:
IQ research’s increasing popularity is due to its status as a battleground, in that it is often—not always, but often—used in an attempt to shift the needle politically. The supposed logic goes that if you think that humans are all just “blank slates” then you’re going to support different policies than if you think that intelligence is completely genetically determined from the moment of conception.
As usual with a battleground, when you see people whacking away at each other in the mud, it is difficult to keep in mind that both sides might be wrong. For two things can be true at once. First, it can be true that there is a habit of reflexively censoring and smearing people, often researchers or scientists, for daring to talk about IQ at all, in any context, no matter how benign their intentions. Look no further than the recent controversy where The New Yorker explicitly said (as a tangent in a review about Jonathan Haidt’s work on the rise of teenage depression) that Haidt maintaining IQ was heritable at all was politically charged and “widely dismissed” by the scientific community. But while different researchers give extremely wide estimates of the heritability of IQ, all the way from low single-digit percentages to upwards of half the variance, The New Yorker’s editorial stance that there is nothing at all in your genes that contributes to intelligence to any degree whatsoever doesn’t even seem like the kind of thing they themselves could possibly believe. The New Yorker’s editor goes home and her daughter babbles her first word and the husband says “Oh, she’s so smart, just like her mama!” and the editor smiles. Readers, in turn, know when they’re being lied to.
Yet it can also be true, at the exact same time, that such criticisms have had a reverse psychology effect, making it seem like intelligence being genetically determined from conception is some sort of golden glowing scientific truth being covered up by the harshest censors. Many people who harp on things like psychology’s ongoing replication crisis will turn a blind eye to the same issues within IQ research, simply because they feel it is unfairly maligned for political purposes.
One reason the debate is unavoidable is that we are a society that sorts teenagers into colleges via what is effectively an IQ test. For scores on standardized tests like the SAT correlate to IQ scores almost as well as the scores on different IQ tests do to each other (in fact, the SAT grew out of an army IQ test). This is what’s behind the standard pro-SAT argument that GPAs and impressive extracurriculars are easier for the rich to game than the SATs are, so standardized tests provide an opportunity for smart poor kids to get into colleges. We can pretend as a society we don’t administer the cousins of IQ tests at a mass scale, but personally I think we should be honest about both its downsides and upsides (I’ve written before about how I likely owe my career to the SAT/GREs).
Within the academic field of psychology, IQ remains the most popular and applicable measure of intelligence—for researchers, it is the canonical “best measure.” But the problem is that when laypeople hear it’s the “best measure” they think it therefore must be a good measure. Instead of what it really is, which is a noisy and weak signal. This makes sense within the context: psychology is a noisy and weak signal discipline with all sorts of problems around reported effect sizes. IQ being “the best” by those standards means a lot less than what outsiders might think. If medical tests, like tests for cancer, had the same reliability as IQ tests, they’d be throwing up false signals to the point of being unusable.
IQ tests are also studiable, just like the SATs, although the degree is up for debate. Schooling may even increase IQ scores by a couple points each year. It’s likely that people with insanely high IQs, e.g., people like Chris Langan (a bouncer with a supposed IQ of 200, who was once described by Esquire as “the smartest man in the world”) are simply IQ obsessives who study and practice IQ tests in the same way that contestants do for Jeopardy.
Due to being highly variable and also studiable IQ tests and their cousins like the SATs are mostly good for sorting large populations we have no other way to sort: they, like democracy, are the best of a bunch of bad options.
Plenty of academic researchers in the field acknowledge these issues (although disagree on how important they are). But places like The New Yorker treating IQ research as inherently dangerous has had the predictable effect. Such criticisms have given rise to a countering force, a growing overly-confident pop-understanding of the IQ literature, one that believes the underlying research is as solid as granite. For whenever I briefly wade into the IQ discourse, like when I pointed out that there’s actually no good evidence that historical geniuses like Einstein or Feynman had bank-breaking IQs (in fact, that the difference between merely high IQ vs. super-high IQ doesn’t seem to have real-world consequences at all), I am deluged by comments along the lines of “Just look at the Study of Mathematically Precocious Youth!” or “Just look at the Terman Study of the Gifted!” or “What about Anne Roes’ work on Nobel Prize winners?”
While there is plenty of more modern IQ research being done with better methodologies, these famous older studies and their results form a halo of mythos; not coincidentally, they are usually what those using IQ research for political purposes regularly cite. They are also, as a rule, rife with problems.
Let’s consider one of these central references in IQ lore to demonstrate that it is not built on granite, but clay. It’s a reference that crops up inevitably in IQ discourse. It’s fascinating in its own right as a tale of how research gets warped, as well as a tale of how gifted education actually takes place. I had been aware of it for years but never investigated it: the Study of Mathematically Precocious Youth (it’s popular enough to have its own Wikipedia page).
Here are some of the main results it supposedly shows. (Quote taken from Steve Hsu’s blog, who gives the clearest summary of its common takeaways).
• Prodigies destined for eminent careers can be identified as early as age 13.
• There is no plateau of ability; even within the top 1%, variations in mathematical, spatial, and verbal abilities profoundly impact educational, occupational, and creative outcomes.
These are quite strong conclusions. And certainly, the study is evocative; it’s a longitudinal study that started in the late 1960s and tracked very young students who performed at incredible levels on the SAT math section of the time (they took it much earlier than the normal age, so even a middling score is amazing). Some famous names you might recognize were part of it, like Mark Zuckerberg, Terence Tao, and even Lady Gaga.
The study is currently most associated with researcher David Lubinski, a professor of psychology at Vanderbilt University who has tens of thousands of academic citations. Drawing on the study’s old data, he writes articles with titles like “Who rises to the top? Early indicators,” tracking the later achievements of the gifted members. Here is the kind of image from his analysis of the Study of Mathematically Precocious Youth (often abbreviated as SMPY). The graph correlates original SAT scores to later life successes like doctorates, patents, income, etc.
Pretty stark, right? It’s the genre of graph that crops up regularly in IQ discourse, and I understand why—those at the very tippy top of scores, like 700, look to be radically outperforming in life achievements even the (also very bright) students who were scoring 500 on the math SAT at 13. The takeaway is obvious: if you aren’t scoring like a genius at 13, if instead you’re “merely” bright, you’ve already lost and are destined to rise to, well, not the top.
Except that, as is often the case, once you start looking into the details of the SMPY, you realize that you can’t possibly draw that conclusion. For Lubinski is not the person who started the SMPY. He took over the longitudinal tracking of its members accomplishments and some other programmatic aspects (it has a bunch of offshoots). The founder of SMPY was actually Julian Stanley, a psychologist at Johns Hopkins University. Here is an account of its origins according to Nature:
On a summer day in 1968, professor Julian Stanley met a brilliant but bored 12-year-old named Joseph Bates. The Baltimore student was so far ahead of his classmates in mathematics… his computer instructor introduced him to Stanley, a researcher well known for his work in psychometrics—the study of cognitive performance. To discover more about the young prodigy’s talent, Stanley gave Bates a battery of tests that included the SAT college-admissions exam, normally taken by university-bound 16- to 18-year-olds in the United States. Bates’s score was well above the threshold for admission to Johns Hopkins, and prompted Stanley to search for a local high school that would let the child take advanced mathematics and science classes. When that plan failed, Stanley convinced a dean at Johns Hopkins to let Bates, then 13, enroll as an undergraduate.
Stanley would affectionately refer to Bates as “student zero” of his Study of Mathematically Precocious Youth (SMPY), which would transform how gifted children are identified and supported by the US education system.
If you don’t see what is already suspiciously different than the popular telling of the study, I’ll direct you to the description from Stanely’s own protégé, highlighting the relevant parts:
“What Julian wanted to know was, how do you find the kids with the highest potential for excellence in what we now call STEM, and how do you boost the chance that they’ll reach that potential,” says Camilla Benbow, a protégé of Stanley’s who is now dean of education and human development at Vanderbilt University in Nashville, Tennessee. But Stanley wasn’t interested in just studying bright children; he wanted to nurture their intellect and enhance the odds that they would change the world.
That is, the original purpose of the SMPY was not to track a cohort and then make causal claims about who rises to the top by sheer dint of their intelligence. Its purpose was to interfere. It was to encourage, to promote, to help, in any way they could, the kids who they identified as gifted.
That’s great! I have no problems with the SMPY as it actually occurred. I do have a problem with the mythology of the SMPY. Because such interference was the entire point. Here’s from Julian Stanley, the actual founder of the SMPY, who wrote an in-depth retrospective of the experience:
By June of 1972, it had become clear that we needed to do something for the children identified as mathematically precocious. Thus, in haste, we decided to create a special, fast-paced mathematics class for the mathematically ablest young students we had found… Mr. Wolfson, a physicist by training who after obtaining his Master's degree from the University of Chicago… worked expertly with about 20 boys and girls, most of whom had just completed the sixth grade. All were of top 1 percent ability mathematically and also verbally or in nonverbal reasoning. The class was a huge success and was followed by a string of successful classes.
As we continued to conduct talent searches for ever-larger numbers in 1972, 1973, 1974, 1976, 1978, and 1979, we experimented incessantly with many different ways of speeding up the learning of mathematics from algebra through Advanced Placement Program Level BC calculus… as well as the learning of biology, chemistry, and physics. This led to refinement and extension of our procedures. We also experimented with other forms of acceleration, or curricular flexibility (what came to be our preferred term), to develop what we called the SMPY smorgasbord of accelerative opportunities.
It was, indeed, a smorgasbord. Eventually there was an entire department at Johns Hopkins devoted to it, which…
… took off like a rocket in January of 1980 with an expanded talent search, now including verbal and general ability. The first residential program of fast-paced courses followed that summer. CTY [Center for Talented Youth] has expanded ever since, now serving over 60,000 young boys and girls each year in its talent searches and over 4,000 in its summer programs, which offer a great variety of courses.
The efforts to help gifted students just continued and continued:
Besides the test-based systematic talent-search concept, wherein each student's abilities are carefully assessed, perhaps the virtually unique contribution of SMPY has been its emphasis on acceleration in its many forms and on fast-paced academic courses. In the latter, students are individually and quickly paced by a mentor through a standard high school subject, such as first-year algebra, biology, chemistry, or physics. Appropriately gifted students can master a whole year of high school subject matter in three intensive summer weeks.
With this level of aid in mind, the SMPY looks a lot less like a well-controlled study of “Who rises to the top?” based on their early SAT scores, and more like what its founder described it as: a “smorgasbord of accelerative opportunities” such as summer programs and individualized tutoring to assist gifted kids.
After the early scores, did all members receive exactly equal treatment? Definitely not. It’s likely higher scorers merited more effort by the researchers, as would be natural given their aims. It’s also quite likely that higher scorers, by dint of sheer impressiveness, were more likely to go to the summer programs and have more individualized tutoring. These were kids spread out across the country. Every case was unique and hugely dependent on the parents. As Stanley says about the early part of the process:
SMPY has sent the SAT scores directly to the examinee, who then could work with his or her parents in the local school situation and community to secure needed curricular adjustments and other opportunities to move ahead faster and better in academic areas of his or her greatest precocity.
It’s obvious that parental efforts to promote education and get their kids into those special SMPY courses or intervene at their local schools likely correlated in degree to their child’s initial scores, either as cause or effect.
What’s weird is that none of the SMPY’s extensive efforts in helping the kids is mentioned in later retrospective papers like “Who rises to the top?” On my initial read of that paper, it seemed for all the world like the research merely tracked students without interference, impartially noting their future amazing accomplishments. Yet as far as I can tell, the SMPY’s efforts at tutoring, courses, outreach, working with administers and local schools, all continued quite late into the program, all the way to the 90s, covering all the cohorts. I had to go back to a 2001 paper by Lubinski and co-authors to find this acknowledged:
Before characterizing various outcomes, a distinctive finding on the educational experiences of these participants (and how they felt about them) should be noted. An overwhelming majority of participants (95%) took advantage of various forms of academic acceleration in high school or earlier to tailor their education to create a better match with their needs.
The methodology therefore is, frankly, too heterogenous and poorly documented to draw straightforward causal conclusions like that there’s no plateau of ability. The real takeaways of the SMPY are instead things like:
On average, participants indicated that their acceleration made no detectable difference in their social life or in their ability to get along with their age peers.
If anything, the SMPY supports the importance of education for high-impact individuals, arguably even what I’ve called “aristocratic tutoring”—the extensive (and expensive) methods that European aristocrats used to educate their children via one-on-one expert teaching, which led to so many of the great historical geniuses and remains the most elite tier of accelerated education.
The SMPY showcases analogous paths in our own era, like skipping grades, exposure to advanced placement classes, individual tutoring, and so on. Again, according to Nature:
SMPY researchers say that even modest interventions—for example, access to challenging material such as college-level Advanced Placement courses—have a demonstrable effect. Among students with high ability, those who were given a richer density of advanced precollegiate educational opportunities in STEM went on to publish more academic papers, earn more patents and pursue higher-level careers than their equally smart peers who didn’t have these opportunities.
Yet this is not the telling of it in the pop-IQ discourse. Instead it is used to support a determinism that imagines you can test two 13-year-olds, both incredibly bright already, and by dint of a marginally more exceptional SAT score you can assign one significantly greater odds of achievement.
The SMPY being a bad example of stark IQ determinism is not some exception. In pop-IQ discourse, reach almost always exceeds grasp.
Despite the fact that I usually end up on the more skeptical side of intelligence research, I personally don’t begrudge anyone who is interested in IQ—human intelligence is fascinating, and we don’t have many ways of even thinking about it, and IQ and its cousins like the SATs are the measurables we can assess. It certainly has its uses, perhaps best showcased in the actual workings of programs like the SMPY: to do initial identification of talent at a mass scale, and then apply targeted education practices that are indeed effective (as the study itself shows) at raising later achievements.
But it should never be forgotten that IQ research is not physics. It is not chemistry. It is a soft science like psychology or sociology. More modern papers have lots of fancy statistics, but that does nothing to lessen its plushy, silky, softness. It is not based on the hardness of some known neuroscientific mechanism we’ve pinpointed that underlies intelligence. There is no causal account we can give for the biological basis of IQ that has any serious tensile strength. Most research consists of paper tests administered for an hour or two, sometimes decades ago, followed by contortion after contortion to account for the complexities of the world.
Build your house atop such foundations, build your political movements atop such foundations, and be warned they may collapse with large thunder.
The article that you link shows a researcher supporting single digit percentage heritability doesn't actually show what you claim it shows. The article says that a polygenic score for patients with neuroimaging data (with only 27k samples) explained 7.6% of the variance in g.
PGS variance explained != heritability! That's like reporting benchmark results for your machine learning model before its finished training.
The low IQ heritability estimates you do find in the literature such as https://pmc.ncbi.nlm.nih.gov/articles/PMC6411041/ all seem to have the same issue: they estimate SNP heritability based on UK Biobank's fluid intelligence test, but they fail to account for the fact that the test sucks! Gold standard IQ tests have a test-retest correlation of >0.9. UK Biobank's is short, so the test-retest correlation is 0.61. This is massively deflating estimates of SNP heritability, and thus broad sense heritability!
They calculate SNP heritability of 0.19-0.22 when actual SNP heritability (after adjusting for the crappy test) is about 0.3.
But that's just SNP heritability. A good portion of the variance in IQ comes from rare variants (population frequency <1%), and about 20% of it comes from non-linear effects that are going to be very hard to capture without much larger sample sizes.
I have yet to find a credible IQ heritability estimate that's lower than 0.5
The biggest thing I took away from my Instrument Development class in grad school was that an instument is only valid when used for it's intended purpose. The IQ test was originally created to classify French orphans with cognitive delays into categories that could be used to group their educational needs by the program supporting them. Taking IQ measurements outside of this context automatically makes them invalid; the context has changed and the instrument for deciding what an individual's IQ is is no longer applicable. Sure, we can acknowledge that defferent instruments have been designed and used over the years to try to quantify intelligence - we certainly aren't using the same test the French orphans were given- but we've completely shifted the purpose of IQ from grouping students with similar abilities to recieve education suited to them, to making IQ a predictive measure of mental ability and success! And how can a body of research be considered solid on such a shifting foundation?