Article

Tech & AI

LLMs vs SLMs: How AI Models Impact Sustainability

By Charlie King

August 08, 2025

undefined mins

Share this article

Prioritise Us on Google

Share this article

Prioritise Us on Google

AI's appetite for resources – particularly energy and water – is immense

Are SLMs the future of sustainable AI? We dive into new models by Microsoft and IBM and ask – can AI be sustainable?

As AI becomes woven into modern business, the need for ever-greater computing power has driven rapid technological innovation, infrastructure development and skill adaptation.

Large language models (LLMs) from players like OpenAI, Anthropic and Google have captured the world’s attention with their ability to parse and generate natural language with an apparently encyclopedic knowledge, powering everything from enterprise chatbots to advanced analytics.

But these models’ appetite for resources – particularly energy and water – is immense. A single ChatGPT query can consume up to 10 times more electricity than a traditional Google search, while the data centres training these models use millions of gallons of water for cooling.

The scale is staggering: training GPT-3 consumed an estimated 1,287 MWh of electricity, equivalent to powering 120 US homes for a year, while Microsoft’s water consumption jumped 34% in 2022, largely attributed to AI operations.

In recent years, small language models (SLMs) – models capable of supporting many powerful use cases but with a distinctly leaner footprint – have emerged as a nimble alternative to LLMs.

So, how are some of the world’s largest organisations adopting SLMs at the heart of their sustainability and operational strategies?

Read the full story in the August 2025 edition of Sustainability Magazine.

What are SLMs?

While large models can wield hundreds of billions or even trillions of parameters, SLMs usually operate in the range of a few million up to about 10 billion parameters so require significantly less memory, processing power and storage.

Technically, SLMs deploy the same transformer architectures as their larger siblings, but optimisation techniques such as knowledge distillation, pruning and quantisation allow them to retain high task-specific performance at a fraction of the resource cost. By using domain-specific training datasets, SLMs can excel at focused tasks – like company-specific email summarisation or call centre enquiry resolution – rather than the general-purpose omniscience claimed by LLMs.

What are LLMs?

LLMs, such as GPT-4 and Gemini, are expansive neural networks trained on vast datasets encompassing much of the digitised world’s text. With up to trillions of learnable parameters, LLMs can exhibit remarkable fluency in language, reasoning, summarisation, code and more. Their strengths lie in adaptability and breadth – an LLM can handle everything from legal document analysis to poetry.

However, this scale carries costs. Training and operating LLMs demands immense computational power, orchestration across specialised hardware (GPUs, TPUs) and a continuous internet connection. This not only increases financial outlays, but also amplifies carbon footprints.

What makes SLMs more sustainable than LLMs?

The lower energy footprint of SLMs is a primary driver of their adoption in sustainability strategies. As each AI model scales down in size, its training and inferencing energy requirements plummet, allowing organisations to meet emissions targets while scaling their use of automation and intelligent services.

Unlike LLMs, SLMs can be deployed directly on edge devices or minimal on-premises infrastructure, further diminishing dependence on energy-intensive centralised data centers.

Green AI is the movement to prioritise efficiency, environmental responsibility and inclusivity in AI development – and SLMs naturally align with these goals.

The cost efficiency of SLMs is compelling for many companies boosting AI infrastructure, particularly for organisations seeking to democratise AI or deploy it at scale where cloud costs are a constraint. Smaller models mean lower infrastructure expenses, faster fine-tuning and minimal GPU requirements.

SLMs are not just greener and cheaper – they are also easier to audit and control. Their simpler structures make it possible for data scientists and compliance teams to explain, debug and mitigate risks faster than with the opaque, massive architectures of LLMs.

The same transparency is proving particularly valuable in regulated sectors such as healthcare and banking, where rapid model explainability is a legal necessity.

SLMs deliver operational flexibility central to modern IT strategies. Their compact size enables deployment on edge devices, private on-premises servers or cloud – wherever latency, privacy or compliance demand it.

SLMs empower organisations to choose the best-fit location for every AI workload, freeing them from the bandwidth and privacy limits that can compromise cloud-only approaches.

Phi-4: Microsoft’s latest SLM

“The energy intensity of advanced cloud and AI services has driven us to accelerate our efforts to drive efficiencies and energy reductions,” says Melanie Nakagawa, Microsoft’s Chief Sustainability Officer.

Melanie Nakagawa, Chief Sustainability Officer at Microsoft

“As AI scenarios increase in complexity, we’re empowering developers to build and optimize AI models that can achieve similar outcomes while requiring fewer resources.”

Microsoft’s Phi-4 is the latest in its series of SLM development. Available through Azure AI Foundry, HuggingFace and Nvidia API Catalog, Phi-4 includes Phi-4-multimodal and Phi-4-mini.

Phi-4 statistics

5.6B - Parameters in the Phi-4-multimodal model, fewer than most competing multimodal systems
6.14% - Word error rate on the Huggingface OpenASR leaderboard, representing a new benchmark record
128,000 - Maximum token sequence length supported by the Phi-4-mini model, enabling processing of extensive text

Phi-4-multimodal handles speech, vision and text, setting new standards in automatic speech recognition and translation – including a benchmark word error rate of 6.14% on the HuggingFace OpenASR leaderboard (surpassing previous records).

“Phi-4-multimodal marks a new milestone in Microsoft’s AI development as our first multimodal language model,” says Weizhu Chen, Technical Fellow, CVP, Gen AI at Microsoft.

“By leveraging advanced cross-modal learning techniques, this model enables more natural and context-aware interactions, allowing devices to understand and reason across multiple input modalities simultaneously.

“Whether interpreting spoken language, analysing images, or processing textual information, it delivers highly efficient, low-latency inference – all while optimising for on-device execution and reduced computational overhead.”

Weizhu Chen, Technical Fellow, CVP, Gen AI at Microsoft

Phi-4-mini is a 3.8B-parameter model specialised for fast, accurate text-based tasks such as reasoning, maths and code generation. It supports token sequences up to 128,000, making it adept at processing lengthy documents.

Both models offer high accuracy and scalability in a compact form, and their lower latency and cost make them ideal for analytical tasks in resource-constrained environments. Their structure also not only improves sustainability, but enhances privacy and security by enabling local, on-device processing.

IBM’s Granite 3.2 models

IBM, another AI leader, is leveraging decades of AI innovation to offer the Granite 3.2 model family. These new models are designed specifically for business use, providing robust language capabilities without the overhead associated with larger competitors.

The Granite 3.2 series integrates advanced features such as “chain of thought” reasoning, enabling step-by-step problem solving. This reasoning capability can be toggled, allowing organisations to save on resources for simpler tasks while deploying advanced logic only when necessary.

“The next era of AI is about efficiency, integration and real-world impact – where enterprises can achieve powerful outcomes without excessive spend on compute,” says Sriram Raghavan, Vice President of IBM AI Research.

Sriram Raghavan, IBM Research

“IBM's latest Granite developments focus on open solutions demonstrate another step forward in making AI more accessible, cost-effective and valuable for modern enterprises.”

A highlight of the Granite 3.2 launch is the Granite Vision 3.2 2B, a compact vision-language model built for enterprise document processing. Trained on more than 85 million PDFs using IBM’s Docling toolkit, it rivals much larger models – including Meta’s Llama 3.2 11B – by efficiently extracting, classifying and reasoning over complex documents.

With additional innovations like the Granite Guardian 3.2 safety model (now 30% smaller but still highly effective) and the long-range TinyTimeMixers forecaster, IBM demonstrates that sustainable, high-performing AI is not only feasible but also accessible to modern enterprises.

The future of language models: infrastructure, scale and sustainability

As both LLMs and SLMs advance, the future of AI in language processing will be characterised by strategic hybridisation, model efficiency and intelligent infrastructure. Hybrid architectures will let organisations combine the broad strengths of a remote LLM with the pinpoint efficiency of local or edge-deployed SLMs—dynamically balancing sustainability, privacy and speed.

While LLMs will continue to push boundaries in cognition and general reasoning, SLMs are poised to drive practical transformation across industries – delivering targeted, low-footprint AI in ways that align with the world’s growing commitment to climate responsibility and digital equity. The coming years will likely see AI success measured not just by what a model can do, but by how efficiently and responsibly it can do it.

As Melanie says: “Sustainability is good business. Sustainable business practices drive innovation.”