Pharma's complexity demands more than AI — it needs bilingual experts
A decade of AI work in drug development has taught the industry an unromantic lesson: the model is almost never the hard part. The hard part is framing the question in a way that is simultaneously scientifically meaningful, clinically usable, and regulatorily defensible — and then building the pipeline so the answer survives contact with each of those audiences. That skill has a name now. The industry calls it bilingual.
The potential of AI and ML to accelerate pharmaceutical R&D is real. Faster analysis, broader pattern recognition, better prediction — the tools are evolving quickly, and in the right hands they change what is possible. The catch is that "the right hands" turn out to be the bottleneck, not the GPUs.
Why pharma does not flatten into "big data"
Treating pharma as a data problem is the mistake that keeps underwriting expensive AI programmes that go nowhere. The domain is not big data. It is staggering, interconnected complexity, and any credible solution has to navigate all of it at once:
- Intricate biological systems. Pathways rarely act in isolation. Off-target effects, biological feedback loops, and patient-level variability introduce uncertainty that most datasets do not even try to capture.
- Multi-dimensional clinical trials. Trial design spans hundreds of interconnected parameters — inclusion criteria, endpoints, titration schedules, site logistics, ethical guardrails — each of which can invalidate the others if moved without thinking.
- Regulatory and compliance demands. Evolving requirements from the FDA, EMA, and other authorities add a layer of scrutiny that does not yield to clever modelling. The FDA's January 2025 draft guidance[] makes this explicit: credibility is assessed against a context of use, not a model card. The context cannot be described, let alone defended, without domain fluency.
- Market and access dynamics. Even a breakthrough therapy has to justify its value across diverse payer landscapes, with real-world evidence increasingly required to secure reimbursement. A model that optimises for a statistical endpoint and ignores the payer story is a model that shipped into a commercial wall.
In this environment, a technically brilliant algorithm built in isolation from the domain will reliably produce outputs that are statistically sound and strategically irrelevant. That is not a model quality problem. It is a framing problem.
What bilingual actually means
The word gets thrown around. Here is what I mean by it, concretely.
A bilingual practitioner can hold two mental models at once: the computational view — what the model is doing, what it needs, where it fails — and the domain view — what the science says, what the clinic will accept, what the regulator will tolerate. They move between the two in the same meeting. They can explain to a data scientist why an endpoint choice changes the target variable, and to a clinical lead why a sampling decision changes the inference they are allowed to draw.
Academic programmes have started naming the same gap[], and reviews of AI in drug discovery[] keep flagging domain expertise as the constraint that more compute does not fix. The technical stack tells the same story: the useful variants of general-purpose language models in this space — BioBERT, ClinicalBERT[] — are the ones that have absorbed the vocabulary and norms of the domain they serve. The lesson scales up from models to teams.
What bilingual people actually do
They are the translation layer between pharma's complexity and tractable AI problems. In practice:
- They frame the right questions. "Predict response" is not a question. "Predict HbA1c trajectory over 56 weeks in patients who would be eligible for a phase 3 under criteria X, benchmarked against the cluster structure from published real-world cohorts" is a question. The difference is not pedantry. It is the difference between a model that ships and a model that dies in review.
- They define success metrics that hold up on both sides. A metric that is defensible to a statistician and meaningful to a medical director is rare. Bilinguals are the people who can negotiate that metric before the work starts, not litigate it after.
- They design pipelines that reflect clinical nuance. Missingness in EHRs is not random. Adherence is not independent of demographics. Trial populations are not representative of label populations. Bilinguals build these facts into the data architecture instead of discovering them in the error analysis.
- They interpret outputs for regulatory plausibility, not just statistical validity. A p-value does not get a drug approved. A credible, documented, context-aware argument does. Bilinguals know the difference and can produce the second.
- They spot weak signals and push back on false confidence. They are the ones who raise a hand when the model is confidently extrapolating into a subgroup it never saw — the subgroup where overconfidence becomes a safety issue.
What the hiring market is getting wrong
Two patterns I keep seeing:
- "We'll hire a data scientist and train them on pharma." This produces someone who can describe the domain but cannot feel when a question is wrong. Domain feel is not an onboarding deliverable.
- "We'll hire a clinician and train them on ML." This produces someone who can vet a model's conclusions but cannot redesign its inputs. Without the computational side, they become reviewers, not builders.
The bilingual ones are usually people who have spent serious time — years, not months — on both sides, and who have been rewarded for standing at the seam rather than retreating to one side. They are rare. They are also disproportionately the reason any given AI-in-pharma programme actually ships something that matters.
The real goal
The goal of AI in drug development is not to automate processes. It is to elevate decision-making in a field where decisions have real human consequences. That demands more than good models — it demands context-aware, domain-embedded intelligence, held by people who can explain, defend, and revise the work on every axis that matters.
If your AI strategy assumes domain fluency will arrive on its own once the infrastructure is in place, the strategy is already failing. It just has not produced the evidence yet.
How is this need for bilingual expertise showing up in your world — and what is actually working to cultivate it, beyond the usual training-programme theatre?
References
- 1.U.S. FDA (January 2025). Considerations for the Use of Artificial Intelligence to Support Regulatory Decision-Making for Drug and Biological Products — draft guidance, including the risk-based credibility framework keyed to context of use.
- 2.Ohio State Online — training "bilingual" graduates fluent in both translational pharmacology and AI.
- 3.AI in Action: Redefining Drug Discovery and Development. PMC, 2025 — review flagging domain expertise as a persistent constraint.
- 4.AI-based language models powering drug discovery and development — BioBERT, ClinicalBERT, and the role of labelled biomedical data. PMC, 2021.