Indian IQ in the GSS

A brief data post

Aug 30, 2025

Recently, Emil Kirkegaard published an article on Substack titled “Indian Average Intelligence: Not a Mystery", where he cited two studies of Indians living in America. The first showed an average IQ of 102.42 and an SD of 15.47 relative to a white IQ of 100 and SD of 15 in a representative sample of eleven year-olds. The second showed, according to Kirkegaard, “an average of about 101 IQ,” and, according to the abstract, had a sample size of 38 (I could not access the full text of the paper). Finally, a large study I found had a sample of almost 19,000 Indian seventh graders in California, who, relative to whites, had an IQ of 101.65 and an SD of 15.59. All of these results agree that Indians living in the U.S are slightly smarter than the average white person, but only to a very small extent (1 or 2 points).

After Kirkegaard’s post on the topic, I got interested, and tried to replicate this data using the General Social Survey (GSS). Because item level data was available, I also tested for measurement invariance—but, because I was made aware that Kirkegaard and an anon are currently working on a meta analysis, and therefore will replicate this data anyway, I only tested for measurement invariance in a very simplistic way; the more difficult task will be left to them.

IQ Measure

The GSS is a semi-regular survey of Americans about a variety of topics: personal wellbeing, political views, etc. In many waves, participants are also given an IQ test referred to as the Wordsum. This is a short, ten question vocabulary test, that, at least according to very old data, correlates about as well with the WAIS as IQ tests tend to correlate with each other (~0.7).

Data Extraction

Using the GSS Data Explorer, I downloaded the following variables in an excel file:

year
id
born
age
race
ethnic
worda,b,c…j
wordsum
ballot

I adjusted for age and survey year, and also separated foreign and U.S born Indians. All results were calculated relative to a native white mean of 100 and SD of 15. The data spanned from 1978 to 2024. My R code is available at the bottom of this post.

Sample Size

In total, there were 188 Indians with Wordsum scores and no invalid inputs for any of the demographic/sample variables. Of these, 46 were born in the U.S, while 142 were first generation immigrants. This is a much higher sample size than in the two studies cited by Kirkegaard, but of course is dwarfed by the California data.

Results

After adjusting for age and survey year, the average Indian IQ in the sample was 95.6, and the SD was 17.25. Split by U.S. birth status, the mean IQ was 93.4 for Indians born here, but 96.1 for first generation Indian immigrants. The difference between these two groups was not statistically significant (t = 0.95, p > .34). All of the studies above sampled exclusively, or at least almost exclusively, U.S born Indians. The largest sample by far reported a mean of 101.65, which is significantly higher than found in the GSS (t = 3.39, p = .001). The 95% CIs are CI [93.22, 98.98] for the foreign-born group and CI [88.63, 98.17] for native-born Indians. Separating waves 2000 and later from those before 2000 did not show any major, or statistically significant, differences.

I used a simple test of measurement invariance—seeing the correlation between the portion of whites getting a given item right and the portion of Indians getting the same item correct. The result was a correlation of 0.99, showing, therefore, no evidence that the items measured something different for each group.

Conclusion

The best estimate of the IQ of Indians living in the U.S is 101-102. Using the GSS, I found a smaller average of 95.6 for Indians overall, and, for the more comparable group, those born in the U.S, an average of 93.4. The standard deviations found here were also slightly larger—17.25 for Indians overall and 16.5 for native born Indians, relative to 15.59 in the large California study. My test of measurement invariance did not find any evidence that the test measured something different for Indians than for whites. However, the test I used was relatively weak, and therefore should be replicated with MGCFA.

Code

Feel free to use this code, of course; most of it was created by Microsoft Copilot anyway.

library(readxl)

library(tidyverse)

library(janitor)

library(dplyr)

library(stringr)

library(purrr)

gss_raw <- read_excel("Gss.xlsx") %>% clean_names() 

#loaded the data with standardized column names

gss_clean <- gss_raw %>% 

  mutate(race_clean = str_to_lower(str_trim(as.character(race))), 

         born_us = case_when(

           born %in% c("YES", "Yes", "yes") ~ 1,

           born %in% c("NO", "No", "no")    ~ 0, 

           TRUE ~ NA_real_

         ))

#created born U.S dummy and cleaned some labels

invalid_ethnic <- c(".n: No Answer", ".u: Uncodable", "Other", 

                    ".d: Do not Know/Cannot Choose", ".s: Skipped on Web", ".i: Inapplicable")

invalid_age <- c(".n: No answer", ".i: Innaplicable", "89 or older")

gss_clean <- gss_clean %>% 

  filter(!(ethnic %in% invalid_ethnic)) %>% 

  filter(!(as.character(age) %in% invalid_age)) %>% 

  mutate(age = suppressWarnings(as.numeric(age)),

         year = suppressWarnings(as.numeric(year)), 

         wordsum = as.numeric(wordsum)) %>% 

  filter(wordsum %in% 1:10)

#purged all invalid values for ethnic, age, and wordsum

#changed age, year, and wordsum to numeric

gss_clean <- gss_clean %>% 

  mutate(born_us = ifelse(born == "YES", 1, 

                          ifelse(born == "NO", 0, NA)))

#created a dummy variable for US Born (1=yes, 0=no)

gss_clean <- gss_clean %>% filter(wordsum %in% 1:10)

#purged all invald values for wordsum

gss_clean <- gss_clean %>% mutate(wordsum = as.numeric(wordsum))

#made the wordsum values numeric

comparison_data <- gss_clean %>% 

  filter((race_clean == "white" & born_us == 1) | ethnic == "India")

#created dataset for comparison between native born whites and all Indians

regression_sample <- comparison_data %>% 

  filter(!is.na(wordsum) & !is.na(age) & !is.na(year))

model <- lm(wordsum ~ age + year, data = regression_sample)

#fit regression model on cases without missing variables

comparison_data <- comparison_data %>% 

  mutate(pred_wordsum = predict(model, newdata = comparison_data),

         wordsum_resid = ifelse(!is.na(wordsum) & !is.na(pred_wordsum),

                                wordsum - pred_wordsum, NA_real_))

#assigned residuals 

ref_stats <- comparison_data %>%

  filter(race_clean == "white", born_us == 1) %>% 

  summarize(ref_mean = mean(wordsum_resid, na.rm = TRUE),

            ref_sd   = sd(wordsum_resid, na.rm = TRUE))

#calculated reference group statistics

comparison_data <- comparison_data %>% 

  mutate(z_wordsum_resid = (wordsum_resid - ref_stats$ref_mean) / ref_stats$ref_sd)

#anchored comparison_data to reference group

word_items <- paste0("word", letters[1:10])

recode_item <- function(x) {

  x_chr <- tolower(trimws(as.character(x)))

  is_na <- grepl("^\\.i:|^\\.n:|^\\.d:|^\\.s:|uncodable|don't know|dont know|dk|refused|inapplicable|missing", x_chr)

  x_chr[is_na] <- NA

  out <- rep(NA_real_, length(x_chr))

  out[grepl("correct|right", x_chr)]   <- 1

  out[grepl("incorrect|wrong", x_chr)] <- 0

  num <- suppressWarnings(parse_number(x_chr))

  out[is.na(out) & !is.na(num) & num %in% c(0, 1)] <- num[is.na(out) & !is.na(num) & num %in% c(0, 1)]

  idx_12 <- which(is.na(out) & !is.na(num) & num %in% c(1, 2))

  if (length(idx_12) > 0) out[idx_12] <- ifelse(num[idx_12] == 1, 1, 0)

  out[!is.na(num) & num %in% c(7, 8, 9, 97, 98, 99)] <- NA_real_

  out

}

#recode item scores to 0/1

white_means <- comparison_data %>%

  filter(race_clean == "white", born_us == 1) %>%

  summarise(across(all_of(word_items), ~ mean(recode_item(.x), na.rm = TRUE))) %>%

  unlist(use.names = FALSE)

indian_means <- comparison_data %>%

  filter(ethnic == "India") %>%

  summarise(across(all_of(word_items), ~ mean(recode_item(.x), na.rm = TRUE))) %>%

  unlist(use.names = FALSE)

#calculate item means for Indians and whites

cor(white_means, indian_means, use = "complete.obs")

#calculates the correlation coefficient--0.99!

forumposter123@protonmail.com

Aug 31Edited

I’m generally in agreement if low average Indian iq. And I think even Brahmins as a whole probably come in the 90s somewhere.

But the overwhelming issue is that Indians aren’t a proper race. They are a caste ridden society on a really unprecedented scale. They are many many separate breeding pools occupying the sub continent. Saying someone will “revert to the mean” isn’t meaningful without knowing what their mean is. Copts don’t revert to the Egyptian mean for instance.

I suspect there are subgroups in India that are “elite”. I just think it’s really small. Smaller than Brahmins as a whole even.

This matches with experience from immigration. If you keep the numbers low you get relatively high performers. If you ramp it up, even with all sorts of skill requirements, quality degrades (see Canada, etc).

Since all migration is inevitably chain migration, I expect no country will ever do better than something in the 90s over time if they bring in significant Indian immigration, and it can easily turn out worse than that.

Expand full comment

Eremetic

Sep 5

Hey, love reading your work, I liked it when you guys refuted veritasium.

I recently came across this smug race denialist

https://youtu.be/CntemqZAvEs?si=8YOCYBsz4k65ZpRm

Could you make a post debunking him, it would be beneficial for future race realists imo, thanks.

4 replies by Alden Whitfeld and others

12 more comments...

Indian IQ in the GSS

A brief data post

Discussion about this post