This blog is no longer active, you can find my new stuff at:,,, and

Machine learning could be fundamentally unexplainable

I'm going to consider a fairly unpopular idea: most efforts towards "explainable AI" are essentially pointless. Useful as an academic pursuit and topic for philosophical debate, but not much else.

Consider this article a generator of interesting intuitions and viewpoints, rather than an authoritative take-down of explainability techniques.

That disclaimer aside:

What if almost every problem for which it is desirable to use machine learning is unexplainable?

At least unexplainable in an efficient-enough way to be worth explaining. Whether it is an algorithm or a human that is doing the explanation.

Let's define "explainable AI" in a semi-narrow sense, inspired by the DARPA definition, as an inference system that can answer the questions:

Why was that prediction made as a function of our inputs and their interactions?

Under what conditions would the outcome differ?

How confident can we be in this prediction and why is the confidence such?

Why might we be unable to answer the above questions in a satisfactory manner for most machine learning algorithms? I think I can name four chief reasons:

  1. Some problems are just too complex to explain. Often enough, these are perfect problems for machine learning, it's exactly their intractability to our brains that makes them ideal for equation-generating algorithms to solve.
  2. Some problems, while not that complex, are really boring and no human wants or should be forced to understand them.
  3. Some problems can be understood, but understanding in itself is different for every single one of us, and people's culture and background often influence what "understanding" means. So explainable for one person is not explainable for another.
  4. Even given an explanation that everyone agrees on, this usually puts us no closer to most of what we want to achieve with said explanation, things like gathering better data or removing "biases" from our models.

I - Unexplainable due to complexity

Let's say, physicists, take in 100 PetaBytes of experimental data, reduce them using equations, and claim with a high probability that there exists this thing called a "Higgs Boson" with implications for how gravity works, among other things.

The resulting Boson can probably be defined within a few pages of text via things such as mass, the things it decays into, its charge, its spin, the various interactions it can have with other particles, and so on.

But if a luddite like myself asks the physicists:

Why did you predict this fundamental particle exists?

I will either get a "press conference answer" which carries no meaning other than providing a "satisfying" feeling, but it doesn't answer any of the above questions.

It doesn't tell me why the data shows the existence of the Higgs Boson, it doesn't tell me how the data could have been different in order for this not to be the case, and it doesn't tell me how confident they are in this inference and why.

If I press for an answer that roughly satisfies the explainability criteria I mentioned above, I will at best get them to say:

Look, the standard model is a fairly advanced concept in physics, so you first have to understand that and why it came to be. Then you have to understand the experimental statistics needed to interpret the kind of data we work with here. In the process, you'll obviously learn quantum mechanics, but to understands the significance of the Higgs boson specifically it's very important that you have an amazing grasp of general relativity, since part of the reason we defined it as is and why it's so relevant is because it might be a unifying link between the two theories. Depending on how smart you are this might take 6 to 20 years to wrap your head around, really you won't even be the same person by the time you're done with this. And oh, once you get your Ph.D. and work with us for half a decade there's a chance you'll disagree with your statistics and our model and might think that we are wrong, which is fine, but in that case, you will find the explanation unsatisfactory.

We are fine with this, since physics is bound to be complex, it earns its keep by being useful and making predictions about very specific things with very tight error margins, it's fundamental to all other areas of scientific inquiry.

When we say that we "understand" physics what we really mean is that there are a few dozen of thousands of blokes that spent half their lives turning their brains into hyper-optimized physics-thinking machines and they assure us that they "understand" it.

For the rest of us, the edges of physics are a black box, I know physics works because Nvidia sells me GPUs with more VRAM each year and I'm able to watch videos of nuclear reactors glowing on youtube while patients in the nearby oncology ward are getting treated with radiation therapy.

This is true for many complex areas, we "understand" them because a few specialists say they do, and the knowledge that trickles down from those specialists has results that are obvious to all. Or, more realistically, because a dozen-domain long chain of specialists combined, each relying on the other, is able to produce results that are obvious to all.

As long as there is a group of specialist that understands the field, as long as those specialists can prove to us that their discoveries can affect the real world (thus excluding groups of well-synchronized insane people) and as long as they can teach other people to understand the field... we claim that it's "understood".

But what about a credit risk analysis "AI" making a prediction that we should loan Steve at most 14,200$?

The model making this prediction might be operating with TBs worth of data about Steve, his browsing history, his transaction history, his music preferences, a video of him walking into the bank... each time he walked into the bank for the last 20 years, various things data aggregators tell us about him, from his preference about clothing to the likelihood he wants to buy an SUV, and of course, the actual stated purpose Steve gave us for the credit, both in text and as a video recording.

Not only that, but the "AI" has been trained on previous data from millions of people similar to Steve and the outcomes of the loans handed to then, thus working with petabytes of data in order to draw the 1-line conclusion of "You should loan steve, at most, 14,200$, if you want to probabilistically make a profit".

If we ask the AI:

Why is the maximum loan 14,200$? How did the various inputs and their interactions contribute to coming up with this number?

Well, the real answer is probably something like:

Look, I can explain this to you, but 314,667,344,401 parameters had a significant role in coming up with this number, and if you want to "truly" understand that then you'd have to understand my other 696,333,744,001 parameters and the ways they related to each other in the equation. In order to do this, you have to gain an understanding of human-gate analysis as well as how its progress over time relates to life-satisfaction, credit history analysis, shopping preference analysis, error theory behind the certainty of said shopping preferences, and about 100 other mini-models that end up coalescing into the broader model that gave this prediction. And the way they "coalesce" is even more complex than any of the individual models. You can probably do this given 10 or 20 years, but basically, you'd have to re-train your brain from scratch to be like an automated risk analyst, you'd only be able to explain this to another automated risk analysts, and the "you" "understanding" my decision will be quite different from the "you" that is currently asking.

And even the above is an optimist take assuming the "AI" is made of multiple modules that are somewhat explainable.

So, is the "AI" unexplainable here?

Well, not more so than the physicists are. Both of them can, in theory, explain the reasoning behind their choice. But in both cases, the reasoning is not simple, there's no single data point that is crucial, if even a few inputs were to change slightly the outcome might be completely different, but the input space is so fast it's impossible to reason about all significant changes to it.

This is just the way things are in physics and it might be just the way things are in credit risk analysis. After all, there's no fundamental rule of the universe saying it should be easy to comprehend by the human mind. The reason this is more obvious in physics is simply because physicists have been gathering loads of data for a long time. But it might be equally true in all other fields of inquiry, based on current models, it probably is. It's just that those other fields didn't have enough data nor the intelligence required to grok through it until recently.

II - Some problems are boring

There is a class of problems that is complex, but not as complex as to be impenetrable to the vast majority of human minds.

To harken back to the physics example, think classical mechanics. Given the observations made by Galileo and some training in analysis, most of us could, in principle, understand classical mechanics.

But this is still difficult, it requires a lot of background knowledge, although fairly common and useful background knowledge and a significant amount of times. Ranging from, say, a day to several months depending on the person.

This is time well spent learning classical mechanics, but what if the problem domain was something else, say:

These are the kind of problems one might well use machine learning for, but they are also the kind of problems that, arguably, could lie well within the realm of human understanding.

The problem is not that they are really hard, they are just really **** boring. I can see the appeal of spending 20 years of your life training to better understand the fundamental laws of reality or the engines behind biological life. But who in their right mind wants to spend weeks or months studying sparkling water supply chains? Or learning how to observe subtle differences in shadow on a CT scan?

Yet, for all of these problems, we run into a similar issue as with case nr I. Either we have a human specialist, or the decision of the algorithm we trained will not be explainable to anyone.

... Hopefully, you get the gist of it

III - Explainable to me but not to thee

This leads us to the third problem, which is who exactly are the understanding-capable agents the algorithms must explain themselves to.

Take as an example an epidemiological psychology generating algorithm that tries to find insight into the fundamentals of human nature by giving a few hundred people questioners on mturk. After fine-tuning itself for a while it finally manages to consistently produce interesting findings, one of which is something like:

People that like Japanese culture are likely to be introverts.

When asked to "explain" this finding it may come up with an explanation like:

Based on a 2-question survey we found that participants which enjoy the smell of nato are much more likely to paint a lot. Furthermore, there is a strong correlation between nato-enjoyment and affinity for Japanese culture[2,14], and between painting and introversion[3,45]. Thus we draw a tentative conclusion that introverts are likely drawn to Japanese culture (p~=0.003, n=311).

This requires only the obvious assumptions that the relation between our results and the null-hypothesis can be numerically modeled into a sticky-chewing-gum distribution and the God-ordained truth that human behavior has precisely 21 degrees of freedom (all of which we have controlled for). It also requires the validity of 26 other studies based on which our references depend, but for the sake of convention, we won't consider the p-values of those findings when computing ours.

Replication and lab studies are required to confirm the finding, this is a preliminary paper meant only to be used as the core source material for front-page articles by The Independent, The NY Times, Vice, and other major media outlets.

Jokes aside, I could see an algorithm being designed to generate questioner-based studies... I'm not saying I have designing one that looks promising, or that I'm looking for an academic fatalistic enough to risk his career for the sake of a practical joke (see my email in the header). But in principle, I think this is doable.

I also think that something like the explanation above (again, a bit of humor aside), would fly as the explanation for why the algorithm's finding is true. After all, that's basically the same explanation a human researcher would give.

A similar reference and statistical significance based explanation could feasibly be given as to why the algorithm converged on the questions and sample sizes it ended up with.

But we could get widely different reaction to that explanation:

In other words, even within cultural and geographic proximity, depending on the person a decision is explained to, an explanation might be satisfactory or unsatisfactory, might make or not make sense, and might prove the conclusion is true or the opposite.

And while the above example is tongue-in-cheek, this is very much the case when it comes to actual scientifical findings.

One can define an anti-scientific world view, quite popular among religious people and philosophers, which either entirely denies the homogeneity needed for science to hold true, or deems scientific reductionism as too limited to provide knowledge regarding most objects and topics worth caring about epistemically. Arguably, every single religious person falls into this category at least a tiny bit, in that they disagree with falsifiability in a specific context (i.e. the existence of some supernatural entities or principles that can't be falsified) and even if they agree with homogeneity (which in turn allows scientific reductionism) in most scenarios, they believe edge cases exist (miracles, souls... etc).

To go one level down, you've got things like the anti-vaccination movements, which choose to distrust specific areas of science. This is not always for the same reason, and often not for a single reason. In Europe, the main reasons can be thought of as:

This combination of causes means that there's no single way to explain to an anti-vaxxer why they should vaccinate their kid against polio or hepatitis, or measles, or whatever new disease might come about or re-emerge in the future.

If we had "AI" generated vaccines, with an "AI" generate clinical trial procedures and "AI" written studies based on those trials, how does the "AI" answer to an anti-vaxxer when asked, "why is this prediction true? why do you predict this vaccine will protect me against the disease and have negligible side effects?".

It could generate a 1000-pages length explanation that amounts to the history of skeptical philosophy and a collection of instances where the scientific method leads to correct theories for otherwise near-impossible to solve problems. Couple that with some basic instructions on statistics, mathematics, epidemiology, and human biology.

Or it could try to generate a deep-fake video of their deceased mom and their local priest talking about how vaccines are good. Couple that with a video of a politician they endorsed getting vaccinate and maybe a papal speech about how we should trust doctors and a very handsome man with a perfect smile in a white coat talking about the lack of side effects.

And for some reason, the first seems like a much better "explanation" yet the latter is why 99% of people that do get vaccinated trust the science. Have you ever read any paper about a vaccine you got or gave to your kids?

I'm passionate about medicine and biology, and I only ever read two vaccine trial papers, both about JE vaccines, and only since they were made in poor Asian countries, and thus my "medical authority" heuristic wasn't able to bypass my rational mind (for reference, the recombinant DNA one from Thailand seems to be the best).

So which of the "explanations" should the algorithm provide? Should it discriminate between the person asking and provide a scientific explanation to some and a social persuasion based explanation to others?

Ant-vaccination is very much not a strawman, 10% of the US population believes giving the MMR vaccine to their kids is not worth the risk [pew]. 42% would not get an FDA approved vaccine for COVID-19 [gallup].

The difference between people results in at least three issues:

  1. Some people might need further background knowledge to accept any explanation (collapses into I and II).
  2. Some people might accept some explanation but it's not what some of us think to be the "correct" explanation.
  3. Some people might never accept any explanation an algorithm provides, even though those same explanations would immediately click for others.

Going back to the "argument from authority" versus "careful reading of studies" approach to trusting an "AI-generated" vaccine study (or any vaccine study, really).

It seems clear to me that most of us made a choice to trust things like vaccines, or classical mechanics, or inorganic chemistry models, or the matrix-inverse solution to a linear regression, way before we "understood" them. We trusted them due to arguments from authority.

This is not necessarily bad, after all, we would not have the time to gain a deeper meaning of everything, we'd just keep falling down levels of abstractions.

III.2 - Inaccessible truth and explainable lies

If a prediction is made with 99% confidence, but our system realizes you're one of "those people" that doesn't trust its authority, should it lie to you, in order to bias your trust more towards what it thinks is the real confidence?

Furthermore, if the algorithm determines nobody will trust a prediction is made, or if the human supervising it determines that same thing, should it's choice be between:

a) Lying to us about the explanation.

b) Coming up with a more "explainable" decision.

Well, a) is fairly difficult, and will probably remain the realm of humans for quite some time, it also seems intuitively undesirable. So let's focus on option b), changing the decision process to one that is more explainable to people. Again, I'd like to start with a thought experiment:

Assume we have a disease-detecting CV algorithm that looks at microscope images of tissue for cancerous cells. Maybe there's a specific protein cluster (A) that shows up on the images which indicates a cancerous cell with 0.99 AUC. Maybe there's also another protein cluster (B) that shows up and only has 0.989 AUC, A overlaps with B in 99.9999% of true positive. But B looks big and ugly and black and cancery to a human eye, A looks perfectly normal, it's almost indistinguishable from perfectly benign protein clusters even to the most skilled oncologist.

For the pedantic among you assume the AUC above is determined via k-fold cross-validation with a very large number of folds and that we don't mix samples from the same patient between folds

Now, both of these protein clumps factor into the final decision of cancer vs non-cancer. But the algorithm can be made "explainable" by investigating which features are necessary and/or sufficient for the decision (e.g via an Anchor method). The CV algorithm can show A and B as having some contribution to its decision to mark a cell as cancerous. Say A is at 51% and B at 49%. But B looks much scarier, so what if a human marks that explanation as "wrong" and says "B should have a larger weight".

Well, we could tune the algorithm to put more weight on B, both B and A are fairly accurate and A overlaps with B whenever there is a TP. So in a worst-case scenario, we're now killing 0.x% less cancer cells than before or killing a few more healthy cells, not a huge deal.

So should we accept the more "explainable" algorithm in this scenario?

If your answer is "yes", if completely irrational human intuition is reason enough to opt for the worst model, I think our disagreement might be a very fundamental one. But if the answer is "no", then think of the following:

For any given ML algorithm we've got a certain amount of research time and a certain amount of compute that's feasible to spend. While in some cases explainability and accuracy can go hand in hand (see, e.g, a point I made about confidence determination networks that could improve the accuracy of the main network beyond what can be achieved with "normal" training), this is probably the exception.

As a rule of thumb, explainability is traded off for accuracy. It's another thing we waste compute and brain-power on that takes away from how much we can refine and for how long we can train our models.

This might not be an issue in cases where the model converges to a perfect solution fairly easily (perfect as in, based on existing data quality and current assumptions about future data there's no more room to improve accuracy, not perfect in the 100% accuracy sense), and there are plenty such problems, but we usually aren't able to tell they fall into this category.

The best way to figure out that an accuracy is "the best we can get" for a specific problem is to throw a lot of brainpower and compute at it and conclude that there's no better alternative. Unless we are overfitting (and even if we are overfitting) determining the perfect solution to a problem is usually impossible.

So if you wouldn't sacrifice >0.01AUC for the sake of what a human thinks is the "reasonable" explanation to a problem, in the above thought experiment, then why sacrifice unknown amounts of lost accuracy for the sake of explainability? If truth takes precedence over explanations people agree with, then how can we justify the latter before we've perfected the former?

IV - I digress

I think it's worth expanding more on this last topic but from a different angle. I also listed a 4th reason in my taxonomy that I didn't have the time to get into. On the whole, I think exploring those two combined is broad enough to warrant a second article.

I kind of hand-wave in a very skeptical (in the humean sense) worldview to make my stance, and I steam over a bunch of issues related to scientific truth. I'm open to debating those if you think they are the only weak points in this article, but I'm skeptical (no pun intended) about those conversations having a reasonable length or satisfactory conclusion.

As I said at the beginning, take this article more as an interesting perspective, rather than as a claim to absolute truth. Don't take it to say "we should stop doing any research into explainable ML" but rather "we should be aware of these pitfalls and try to overcome them when doing said research".

I should note, part of my day-job actually involves explainable models, 2 years of my work are staked in a product which has explainability as an important selling point, so I am somewhat up-to-date with this field and also all my incentives are aligned against this hypothesis. I very much think and want the above problems to be, to some degree, "fixable", I get no catharsis from pointing them out.

That being said, I think that challenging base assumptions about our work is useful, as a mechanism for reframing our problems as well as a lifeline to sanity. So don't take his as an authoritative final take on the topic, but rather as a shakey but interesting point of view worth pondering.

If you enjoyed this article I'd recommend you read Jason Collins's humorous and insightful article, Principles for the Application of Human Intelligence. Which does a fantastic job at illustrating the double standards we harbor regarding human versus algorithmic decision-makers.

I'd also self-plug some of my own work around the topic of machine learning interpretability, explainability, and limitations:

If you enjoy the epistemological framework used by this article, you might also enjoy:

Published on: 2020-12-16



twitter logo
Share this article on twitter
 linkedin logo
Share this article on linkedin
Fb logo
Share this article on facebook