Voices Unheard: AI revolution is leaving 2,000 African languages behind
April 7, 2025372 views0 comments
Joy Agwunobi
Artificial intelligence (AI) holds the promise of revolutionising key sectors such as education, healthcare, and access to justice. However, its benefits remain unevenly distributed, as linguistic underrepresentation, infrastructure deficits, and biased algorithms risk exacerbating existing inequalities, leaving marginalised communities behind.
These concerns were central to discussions at the Global AI Summit on Africa, where Bibliothèques Sans Frontières (Libraries Without Borders), Kajou, and the AI Lab Pleias advocated for ethical, sustainable, and culturally inclusive AI, particularly for low-resource countries.
As part of their advocacy, the organisations launched a white paper titled “Beyond the Hype: Building Equitable and Sustainable AI for Social Impact” under the Ideas AI initiative. The publication explores AI’s transformative potential while addressing the significant challenges that hinder its inclusive deployment.
Read Also:
The core concern is that AI could become an exclusive tool for wealthier nations, leaving underserved communities behind. At a time when AI risks deepening global inequalities, the white paper serves as a call to action to develop AI that is frugal, ethical, inclusive, and culturally relevant to ensure its benefits are accessible to all.
The whitepaper examines the severe underrepresentation of local languages in AI datasets and the structural barriers that prevent equitable AI development, particularly in Africa and other low-resource regions. The study stresses that multilingual data scarcity is not merely a technical issue but a significant human challenge, as AI tools often fail to serve communities that lack digital representation. Without intervention, it noted that AI deployment risks deepening social and economic inequalities rather than mitigating them.
According to the paper, the world has over 7,000 languages, with 2,000 spoken across Africa. However, AI models overwhelmingly prioritise English and European languages, dedicating over 90 percent of training data to English, with the remaining 10 percent covering other high-resource European languages such as Spanish, French, German, and Italian. This leaves less than 1 percent of AI training data for the thousands of languages spoken by billions, particularly in Africa, Asia, and indigenous communities worldwide.
This imbalance, it noted, is not just a matter of volume but a reflection of systemic biases in AI development, adding that historical policies that favoured European languages have marginalised non-European datasets, limiting AI’s ability to process diverse linguistic structures, cultural contexts, and reasoning frameworks. As a result, many AI models underperform when dealing with local dialects, indigenous knowledge, and socio-cultural nuances.
Without addressing this gap, the study stressed that AI risks accelerating language loss—currently, one language is lost every three months—as dominant languages become more embedded in digital technologies. Furthermore, indigenous knowledge systems may be systematically excluded from AI-powered information platforms, depriving future generations of culturally rich and diverse insights.
The study warns that economic opportunities arising from AI will disproportionately benefit communities already advantaged by language representation in AI datasets.
While increasing data diversity is essential, the study emphasises that it is not enough. Even with more multilingual data, the Global South faces another critical challenge—the lack of infrastructure needed to train, deploy, and scale AI systems effectively.
The whitepaper highlights the infrastructure challenges that hinder AI adoption in emerging markets, particularly in Africa, Latin America, and parts of Asia. The global distribution of data centres reveals significant disparities with North America, home to just 4.7 percent of the global population, hosts 40 percent of the world’s data centres, Europe accounts for 30 percent of data centre capacity, and Africa, despite housing 17 percent of the world’s population, holds less than 1 percent of global data centre capacity.
The whitepaper also highlighted the challenge of connectivity and digital access constraints, according to the paper, AI technologies require stable and high-speed internet connectivity, but Africa lags behind significantly in the following regard: Internet penetration: While North America and Western Europe have over 90 percent internet penetration, Africa averages just 37 percent.
Fixed broadband access: Less than 5 percent of African households have fixed broadband, compared to over 80 per cent in developed markets.
In terms of Average internet speeds: North America: 150+ Mbps (fixed broadband), Western Europe: 120+ Mbps and Sub-Saharan Africa: 28 Mbps (with even lower speeds in rural areas, often below 10 Mbps)
While mobile networks have expanded, adoption remains low due to affordability issues. Although 4G covers 77 per cent of Africa’s population, only 28 per cent of people use it due to high costs and device limitations. Additionally, 5G adoption is in its early stages, covering less than 3 per cent of Africa’s population compared to over 80 per cent in leading markets.
Beyond infrastructure, the White Paper noted that Africa also suffers from a lack of essential natural language processing (NLP) tools that are crucial for AI applications. The absence of language identification, classification, and tokenisation tools means that many local languages cannot be effectively processed by AI systems. This technical limitation restricts the ability of AI to understand diverse cultures, provide accurate translations, and develop locally relevant applications.
The study argues that truly inclusive AI requires a comprehensive approach that combines: Data collection methodologies to ensure diverse linguistic representation, evaluation frameworks to measure AI effectiveness across different cultures, basic NLP tool development to enable AI applications in underserved languages.
Addressing the infrastructure challenges for AI deployment in emerging markets, the white paper emphasises that deploying AI in emerging markets cannot rely on a one-size-fits-all strategy. Instead, it calls for a balanced approach that aligns technological innovation with local realities—economic constraints, limited connectivity, and existing infrastructure gaps.
“No single approach can fully address the complex challenges involved,” the report states, “but complementary strategies can create viable pathways to meaningful AI participation despite infrastructure limitations.”
Key to the proposal is leveraging what is already available. The paper argues that AI systems must be designed for efficiency and accessibility, using methods such as edge computing, mobile-first deployment, and CPU optimisation to deliver high-impact outcomes without relying on high-end infrastructure. These “frugal” innovations are especially important in areas where internet access is limited or unreliable.
The publication further outlines six concrete recommendations aimed at ensuring that AI technologies benefit underserved populations without further entrenching inequality:
- Frugal, sustainable and offline AI: invest in offline, affordable, and resource efficient AI to foster equitable access.
- Linguistic and cultural inclusion: fund open-source NLP tools for underrepresented languages to ensure diversity and fairness.
- Ethical AI governance: mandate human oversight to respect rights and community values.
- Open-source transparency: prioritize transparent, open-source AI models and datasets for accountability and community adaptation.
- Inclusive evaluation metrics: adopt culturally relevant benchmarks prioritizing fairness and accessibility alongside accuracy.
- Build local capacity: support local AI education, entrepreneurship, and research in LMICs to sustain innovation.
Additionally, the white paper argues that physical infrastructure alone is not the silver bullet for AI deployment, but rather building local knowledge, skills, and institutions is crucial.
“Building human capacity alongside physical infrastructure development ensures that investments translate into sustainable capabilities rather than stranded assets,” the paper asserts. “By investing in education, training, and knowledge transfer, emerging markets can develop the expertise necessary to maintain, operate, and adapt AI infrastructure to local needs.”
This holistic focus on both human and technological readiness reflects the broader mission of Kajou and BSF. Known for deploying local microservers and microSD cards to deliver educational and health content in offline settings, both organisations bring practical, on-the-ground experience from projects across countries like Senegal, Ivory Coast, and the Democratic Republic of Congo.
Despite the considerable challenges, the white paper highlighted the current moment as an opportunity to build a more inclusive AI future, stating: “The challenges are substantial, but the opportunities are even greater. By embracing a holistic, context-aware approach, and working collaboratively across sectors, we can harness the transformative potential of AI to create more equitable and effective opportunities for all, regardless of their background or location. Let us seize this moment to build an AI-powered future that truly leaves no one behind.”