From Promise to Practice: How We’re Scaling AI for Humanitarian Good

New York, March 4, 2026 — For the humanitarian sector, artificial intelligence arrives wrapped in paradox.

On the one hand, the promise is enormous: tools that can translate across languages instantly, support overwhelmed frontline workers, anticipate crises like climate shocks before they strike, and extend scarce resources further than ever before. On the other hand, when humanitarians apply technologies from third parties, where does responsibility lie for biases, access barriers, erroneous outputs, and inclusive design? The risks of irresponsible AI deployment have real-life consequences in contexts where there is no margin for error - not to mention its environmental burden, data privacy concerns, and more.

Over the past two years at the International Rescue Committee (IRC), our colleagues have been living inside this tension. With support from the Mastercard Center for Inclusive Growth and data.org through the Artificial Intelligence to Accelerate Inclusion (AI2AI) Challenge, and a broader coalition of philanthropic and private-sector partners, our teams have been testing what it actually takes to deploy AI responsibly in humanitarian settings. Not as a thought experiment. Not as a press release. But as working systems used by real people, in real contexts, under real constraints.

This work is iterative and never complete, only evolving. What follows is a set of lessons—hard-won, foundational and evolving—about how the promise of AI can begin to translate into humanitarian impact.

Learning One: In Humanitarian Contexts, “Responsible AI” Is an Operational Discipline, Not a Slogan

Much of the global conversation about responsible AI happens at the level of principles. Fairness. Transparency. Accountability. These matter—but in humanitarian work, responsibility shows up in far more concrete ways.

Can an AI system serve clients in low-bandwidth environments?
Can it operate safely and fairly in dozens of languages, including those underrepresented in training data?
Can it be audited, governed, and overridden by humans when necessary?
And critically: if AI initiatives fail, can we ensure that no one gets hurt along the way??

At the IRC, we quickly learned that responsible AI deployment depends on infrastructure, not just policies. Long before generative AI, Signpost had built a digital communications platform reaching communities in roughly thirty countries through web and social channels, serving more than 20 million people since 2015. The model centered on co-creating trusted, localized information with communities and maintaining two-way channels that connected people to IRC social workers for personalized support. Research showed that this personalized assistance drove impact—but was difficult to scale—prompting us to cautiously explore AI as a way to extend human capacity without sacrificing quality or trust.

Early pilots demonstrated both the potential and the limits of simple AI approaches. While the systems met quality standards roughly 75% of the time and increased staff productivity by nearly 70%, they required human review before responses could be shared safely. This revealed the need for AI systems that could recognize their own limits, validate outputs for safety, and respond directly only when proven reliable.

This realization led our teams to develop Signpost AI, an AI agent building platform designed with the help of partners and supporters like Google.org, Zendesk, and Cisco to create, test, and safeguard AI agents for humanitarian use with greater precision and much greater safety guardrails. These strict safety and performance requirements meant not only the ability to create agents for Signpost projects, it unlocked the ability to create agents across sectors—from protection and education to refugee resettlement and anti–human trafficking. And it's led us to design our AI differently, in ways purpose built for responders and communities living in unimaginable crises.

Rather than relying on generic chatbots, at the IRC we design our own agents. The process is much like the way one would design a job description: defining roles, responsibilities, safety protocols and guardrails, workflows, and knowledge boundaries. Every agent is purpose-built. Every deployment is monitored. And every use case is evaluated for safety, cost-effectiveness, and real-world benefit.

This work has been slow by Silicon Valley standards—and intentionally so. But it has allowed us to move beyond abstract debates about AI risk toward a more useful question: under what conditions can AI reliably help people in crisis, and when should it not be used at all?

Learning Two: Scale Comes from Systems, Not Standalone Tools

A single tool or successful experiment might feel like a breakthrough. But we need to remember that it's just the beginning. All too often, tools are built, tested, written up—and then quietly abandoned because they cannot integrate into larger systems. At the IRC we are only at the beginning of this journey and the jury is still out for our current programs - will they drive the impact we hope or will we chock them up to lessons learned?

The IRC’s experience suggests that AI only begins to matter at scale when it is embedded into platforms that can support multiple applications over time.

This is where Signpost offers a useful example. Signpost has reached more than 20 million people worldwide, providing localized, verified, critical, sometimes even life-saving information to communities affected by crises. The question our teams faced was not whether AI could generate answers faster—it clearly could—but whether it could do so safely, accurately, and in ways that complemented human judgment rather than replacing it - and all the while not overstepping its limits.

Today, Signpost AI–enabled agents have piloted in eight countries, supporting both frontline staff and, in some contexts, direct engagement with users. These systems are being rigorously evaluated for efficiency gains, response quality, and unintended effects. Importantly, we are publishing what we learn—sharing guidance, tools, and failures with the broader sector—because no single organization should be figuring this out alone.

A systems logic underpins ALMA, a virtual assistant designed to support refugees and other newcomers navigate life in the United States. ALMA helps users understand documentation requirements, prepare for employment opportunities, access services, and navigate complex bureaucracies—areas where confusion and delays can have lasting consequences. While still at an early stage, ALMA is teaching us how AI could be used to reduce administrative friction without obscuring accountability or trust.

In both cases, the insight is the same: AI becomes useful not when it is clever, but when it is boringly reliable—and when it fits into the systems people already use.

Learning Three: The Most Promising AI Applications Are the Ones That Work Where Access is the Lowest

Perhaps the most encouraging lesson so far comes from aprendIA, an easy-to-use AI-driven chatbot delivering personalized learning to teachers in crisis-affected settings through simple messaging applications they already use on devices they already own. aprendIA delivers both long-term skillbuilding opportunities through evidence-based micro course work and daily classroom assistance through immediate response. Teachers value aprendIA for classroom management direction and foundational skills teaching, such as literacy, numeracy and social emotional skills.

In Nigeria, aprendIA has moved from a small, controlled pilot to early regional scale in partnership with Ministries of Education in Borno, Adamawa and Yobe States of Northeast Nigeria, from an initial cohort of around 400 teachers, now reaching 4,700–and expected to reach a user base of over 22,000 before the end of 2026. What makes this expansion noteworthy is not the novelty of the technology, but its fit with context: low bandwidth, local languages, asynchronous use, and close collaboration with educators themselves.

aprendIA challenges a persistent assumption in AI discourse—that meaningful AI requires sophisticated infrastructure. In reality, some of the most impactful applications are those that meet people where they are, rather than where technologists wish they were.

This is not accidental. It reflects years of investment in human-centered design, local partnerships, and rigorous evaluation. And it underscores a broader point: AI’s value in humanitarian contexts will be determined less by model sophistication than by contextual intelligence.

Learning Four: Partnerships Are Not Optional—they Make the Difference

None of this work would be possible without an unusual coalition of partners spanning philanthropy, technology, and humanitarian practice.

The Mastercard–data.org AI2AI Challenge provided catalytic funding at a moment when experimentation was necessary but risky. In-kind support or seed capital from partners and supporters like Google.org, Zendesk, Cisco, the Patrick J. McGovern Foundation, and Anthropic have provided compute, tooling, and technical collaboration that dramatically lowered barriers to responsible prototyping. All have supported the unglamorous but essential work of governance, evaluation, and knowledge sharing. Additionally, access to licenses for staff users has proven truly empowering.

What matters here is not brand association. It is alignment. When partners share a commitment to open learning, safety, and real-world impact—rather than quick wins—AI can be explored in ways that serve public good rather than undermine it.

From Admiring the Problem to Solving It

The humanitarian sector is rightly cautious about AI. The stakes are high. But caution should not become paralysis, after all, it’s our duty to do the very best with whatever resources we have at our disposal.

At a moment when global humanitarian needs are rising and resources are shrinking, the question is not whether AI will shape the future of aid—it already is. The real question is whether we will shape it deliberately, responsibly, and in service of those who need it most.

What we are learning at the IRC is that doing so requires humility, patience, and a willingness to publish uncomfortable truths alongside successes. It requires treating safety as a design constraint, not an afterthought. And it requires moving from admiring the scale of the problem to engaging with the difficult work of solving it.