Today the science subreddit is having a discussion on racial bias in science in conjunction with Science Magazine publishing “Doing Science while Black,” by Dr. Ed Smith. This discussion has inspired me to share some research that I did a year and a half ago on the “pipeline problem” in tech, and STEM more broadly. That problem being: Tech companies can’t hire minorities if there aren’t minorities who are trained for the job.
My experience, and the research in the reddit discussion, shows that the pipeline problem isn’t the only thing keeping minorities out of tech. I’m going to share that experience and I’m going to show that my experience is not unique.
Let’s talk about me
It’s been over 3 years since I graduated with my PhD in statistics (specializing in machine learning) from UIC. I thought things would be fairly easy given that I have the hottest degree in the tech industry. Not so. Being half-Mexican with an identifiably Hispanic name (this is an employer’s first impression upon seeing my resume), but appearing to be a vaguely ethnic White person, my employment in industry has felt like a social science experiment for which I never volunteered. Nonetheless, I feel an obligation to use my white privilege to highlight injustice, much like the woman in this story:
I previously wrote about the lawsuit I filed against Civis Analytics. That post also covered the statistics on racial pay discrepancies in tech. In short, I was the only machine learning (PhD) or optimization (masters) degree holder at a data science company. I was treated like an idiot while training supposed peers and simultaneously getting paid less than them. From optimization to linear models, anything beyond the first month of an undergrad level course required me explaining it to members of the data science team (that I wasn’t initially allowed to join) and its management.
I’ve since withdrawn my lawsuit. It was not a good use of my mental energy. I had the CFO and a co-founder being quoted as saying racist things. It didn’t matter. I was the one under-represented minority (URM) and there are an infinite number of variables in hiring.
If one were to file a lawsuit, your employer would just need to find one thing you don’t have that someone else does and claim that as their reasoning for paying you less. In this respect, racist hiring practices actually serve to insulate companies from litigation.
Political Aside – I’ll admit it’s a little frustrating to hear that this company, whose C-level executive refers to “speaking Spanish” as “speaking poor” and whose hiring practices were consistent with this attitude, is still receiving business from the Democratic National Committee and Hillary’s Super Pac. That Hillary’s campaign is right now attacking Donald Trump for referring to Miss Universe as “Miss Housekeeping”, in addition to the effect of NAFTA on Mexican campesinos, her previous “super-predator” statements, and Obama’s failure to prosecute bankers that were actually super-predators of middle-class minority households, makes it feel like minority political power is non-existent. But that’s for another post.
Highlights since leaving Civis include: being yelled at by a subordinate repeatedly, being lied to repeatedly by my boss, providing physical proof of him lying to HR who proceeded to yell at me, moving to a new job to only have my new boss harass me by saying things like, “What kind of Mexican are you? A drug dealer or a rapist?”, proceeding to contact HR and then have him still be my boss and go right back a week later to saying inappropriate things. Thanks Vivaki and Sears!
For those URMs looking for jobs in tech, here’s my best advice: Find a place where other URMs are employed. Find a place where management doesn’t primarily consist of a population of tall, attractive, white or Asian males. This indicates that they aren’t hiring and promoting based on ability, but instead are promoting based on their gut-feeling of who looks like a “leader”… like they’re selling jeans!
Hiring Discrepancies in Tech
All that advice presumes that you have multiple offers or enough savings to remain unemployed. I certainly didn’t. It took me 5 months to find a job, including a 4 month interview process with Civis, and a month waiting tables. I wonder how many other machine learning PhDs have waited tables.? I would guess very few… unless you’re an URM! Because that’s what the data says.
In my last post on this topic I used some data from a USA Today article on pay discrepencies. That data was solid. They found that the discount for being Hispanic in tech is 16%, but only 4% for Blacks. This was consistent with my pay gap of approximately 20% relative to my comparable coworkers at Civis.
But again, there is only a pay gap if you get hired! Just 3 days after that article was posted another one came out that talked about hiring, Tech jobs: Minorities have degrees, but don’t get hired. It tries to address the defense that many tech CEOs make, that there aren’t enough minorities in the pipeline.
Unfortunately, this piece was less impressive. While their conclusions that URMs (Hispanics and Blacks) were approximately correct, the Hispanic and Black pipelines are respectively two and four times the size of Hispanic and Black employment in tech, the methodology could be improved. The data presented also doesn’t address what would seem to be the real headline; Asians are employed at twice the rate that they are graduating.
Below, I attempt to come up with better estimates of the tech pipeline using the available data from the article and open data sets. I’m going to try to unpack this work using some basic assumptions, R programming, and Rmarkdown. My hope was that this would clarify these numbers so that they are a little more convincing. I think I succeeded, but let me know what you think.
Here’s the initial USA Today data:
library(xtable) usatoday.mat <- matrix(c(47.7, 60.6, 43.4, 18.8, 3.2, 6.5, 1.8, 4.5), nrow = 4, byrow = T) colnames(usatoday.mat) <- c("Staff", "Graduates") rownames(usatoday.mat) <- c("White", "Asian", "Hispanic", "Black") print(xtable(usatoday.mat, caption = 'USA Today Data'), type="html", html.table.attributes = c("align=center, border=1px solid black, padding=5px"), caption.placement = "top")
barplot(t(usatoday.mat), beside = T, legend.text = T, main = "USA Today Data", xlab = "Race", ylab = "Percentage of Staff")
The employment statistics in the article come from a third party whose data I couldn’t find/access.
The biggest issues that I can address deal with the pipeline. The authors don’t take into account the globalized workforce within the US tech industry, nor do they take into account the fact that the great majority of tech employees entering the workforce (the pipeline) have a post-graduate degree. In the article, only the domestic bachelors degree population statistics were used for the pipeline.
The first thing to do is to unpack the racial demographics with some assumptions. We’re going to assume that the “nonresident alien” (NRAliens) race demographics follow that of the world. Comparing the national and global population demographics, we would expect that the pipeline of tech workers would naturally skew towards a more Asian demographic than the typical US population.
library(xtable) pop.mat <- matrix(c(45, 5.5, 28.9, 32.9, 71.5, 4.6, 15.8, 14.5, 77.7, 5.3, 17.1, 13.2, 16.7, 60.6, 8.5, 14.2), nrow = 4) colnames(pop.mat) <- c("Chicago", "Illinois", "US", "World") rownames(pop.mat) <- c("White", "Asian", "Hispanic", "Black")
The Undergrad/Postgrad Pipeline
We’re also going to assume that the pipeline of CS graduates is proportional to the numbers reported in the CRA report. 1991 PhDs were awarded, 10326 masters degrees were awarded, and 15087 bachelors degrees were awarded in 2013 by the sampled institutions. There were also 9875 new masters students, so we can expect only 5212 (
= 15087 - 9875) of the bachelors students go into the workforce. There were 2728 new PhD students, so we can expect 7598 (
= 10326 - 2728) masters students to go into the workforce. This is also going to give us a significantly different picture of the CS/tech pipeline.
degree.count <- c(5212, 7598, 1991) degree.prop <- degree.count / sum(degree.count) degree.tbl <- t(degree.prop) colnames(degree.tbl) <- c("Bachelor", "Master", "PhD")
pipeline.mat <- matrix(0, nrow = 5, ncol = 3) rownames(pipeline.mat) <- c("White", "Asian", "Hispanic", "Black", "NRAliens") colnames(pipeline.mat) <- c("Bachelors", "Masters", "PhD") pipeline.mat["White", ] <- c(60.6, 28.9, 29.0) pipeline.mat["Asian", ] <- c(18.8, 9.0, 9.5) pipeline.mat["Hispanic", ] <- c(6.5, 1.8, 1.4) pipeline.mat["Black", ] <- c(4.5, 2.0, 1.4) pipeline.mat["NRAliens", ] <- c(7.6, 57.1, 58.3)
Putting it together
So what does the current flow of CS graduates look like? First we have to allocate the NRAliens in the pipeline and then we have to allocate the students going into/getting out of in grad school.
nra.pipeline <- outer(pop.mat[, "World"] / 100, real.pipeline.mat["NRAliens", ]) real.pipeline.mat <- t(t(pipeline.mat) * degree.prop) globalized.pipeline <- real.pipeline.mat[-5, ] + nra.pipeline
Summing this matrix across the rows will give us the race percentage in the tech pipeline.
globalized.pipeline.race <- apply(globalized.pipeline, 1, sum) gpr.tbl <- t(globalized.pipeline.race) colnames(gpr.tbl) <- c("White", "Asian", "Hispanic", "Black")
Globalized Digital Divide
Using the numbers calculated above, we can see a better estimate of the racial demographics for the tech graduate pipeline corrected for the presence of grad school students and international students.
new.usatoday.mat <- cbind(globalized.pipeline.race, usatoday.mat[, "Staff"]) colnames(new.usatoday.mat) <- c("Pipeline", "Staff")
The drastic difference between the Asian pipeline and staffing that was in the USA Today has been reduced. Unfortunately the underrepresented minority disparity still exists. With the data at hand (i.e. not having access to the USA Today Research survey), the best we can hope for is an expectation of how the racial makeup of the technology sector will be changing if the sector hires fairly going forward. This should involve a doubling in the number of Hispanic staff (3.2% to 6.8%) and a quadrupling in the number of Black staff (1.8% to 8.4%).
Next we’re going to look at what an equitable globalized tech workforce should look like. Assuming college admission/graduation rates for NRAliens are stable and the distribution of students entering the workforce with varying levels of qualification remains stable, we can calculate what an equitable globalized work force would look like.
nr.alien.share <- apply(real.pipeline.mat, 1, sum)["NRAliens"] / 100 domestic.share <- 1 - nr.alien.share us.international <- t(matrix(c(domestic.share, nr.alien.share))) colnames(us.international) <- c("US", "World")
Taking a weighted average of US and World racial demographics tells us what an equitable distribution of tech college students should look like:
nr.alien.share <- apply(real.pipeline.mat, 1, sum)["NRAliens"] / 100 domestic.share <- 1 - nr.alien.share us.international <- t(matrix(c(domestic.share, nr.alien.share))) colnames(us.international) <- c("US", "World") equitable.enrollment <- cbind(pop.mat[, "US"] * us.international[, "US"], pop.mat[, "World"] * us.international[, "World"]) equitable.enrollment <- cbind(equitable.enrollment, apply(equitable.enrollment, 1, sum)) colnames(equitable.enrollment) <- c("US", "World", "Total")
Finally, we put together the equitable pipeline, the actual pipeline, and current staffing, normalized to a 4-way race model.
final.mat <- cbind(Equity = equitable.enrollment[, "Total"], new.usatoday.mat) final.mat <- t(t(final.mat) / apply(final.mat, 2, sum))
barplot(t(final.mat), beside = T, legend.text = T, xlab = "Race", ylab = "Percentage of Staff")
Relative to the current employment pipeline, the White population is represented equitably, but slightly disproportionately to the pipeline. Additionally, the Asian population has disproportionate representation relative to both the academic pipeline and from the pipeline into industry. The underrepresented minorities are… underrepresented everywhere.
This more rigorous treatment of the data is not perfect, but it helps to show us two things. Not only do we have to make the pipeline of qualified tech workers more representative of the population overall, but we have a long way to go to make the workplace more representative of the pipeline as it is!!! This was the thesis of the original article, and it was approximately correct. Many in the tech world want to believe that their companies are meritocratic and that the lack of representation has nothing to do with their companies’ hiring practices. The cleaned data shows us that they can’t be let off the hook so easily.
Concerns have been raised that the pipeline has perhaps only recently become as diverse as it is. The thought is that this could be the reason for the discrepancy between degrees awarded and jobs filled. From the data available here this is unknowable, but anecdotal evidence suggests this isn’t the only problem. I’ve already collected enough horror stories to convince my friends and family and STEM diversity initiatives are now aged in the decades.
This means that not only do we need affirmative action in STEM academics, but we also need non-discrimination in STEM industry! The academic pipeline of URMs is being under-utilized in industry and this hurts everyone.
Finally, this also brings to mind one of my favorite economists, Ha Joon Chang, issuing one of the strongest cases for affirmative action that I’ve read (emphasis added):
Equality of opportunity is the starting point for a fair society. But it’s not enough. Of course, individuals should be rewarded for better performance, but the question is whether they are actually competing under the same conditions as their competitors. If a child does not perform well in school because he is hungry and cannot concentrate in class, it cannot be said that the child does not do well because he is inherently less capable. Fair competition can be achieved only when the child is given enough food – at home through family income support and at school through a free school meals programme. Unless there is some equality of outcome (i.e., the incomes of all the parents are above a certain minimum threshold, allowing their children not to go hungry), equal opportunities (i.e., free schooling) are not truly meaningful.