§ 01

Introduction and Motivation

In 2023, a widely circulated comparison claimed that generating a single response from a large language model consumed ten times more energy than a Google search query. The claim was technically narrow — it compared server-side GPU computation for an unoptimised, low-utilisation research deployment against a decade-mature search stack — but it lodged in public consciousness, shaped ESG discourse, and influenced early regulatory thinking on both sides of the Atlantic.

This paper makes a different comparison. Rather than asking 'how much energy does a server consume to answer one query?', we ask: 'how much energy does a user consume to satisfy one complex information need?' This reframing — from server-side computation to full-stack session — changes the answer substantially.

A search engine does not provide information;
it provides a map to information hosted elsewhere.

The energy cost of navigating that map — downloading pages, rendering JavaScript, processing advertisement auctions, and spending time reading — is borne by the user's device, the telecommunications network, and a largely invisible ad-tech infrastructure. None of these costs appear on the data centre's meter.

The past eighteen months have also transformed the empirical landscape. Google published a peer-reviewed technical paper (arXiv:2508.15734) documenting that the median Gemini text prompt consumed 0.24 Wh in May 2025 — a 33-fold reduction from the same measure twelve months prior.2 OpenAI's CEO disclosed a comparable 0.34 Wh for ChatGPT. Meanwhile, the HTTP Archive 2025 Web Almanac recorded the median mobile page at 2.56 MB — a figure that, transmitted over a 4G network at Nokia's measured 0.17 kWh/GB, costs more in network energy alone than the entire LLM inference, before the user's device draws a single watt.

§ 02

Related Work and Analytical Gap

2.1 The Server-Centric Measurement Tradition

The benchmark for search-engine energy was established in 2009, when Google disclosed that one query consumed approximately 0.3 Wh, including indexing and retrieval. This figure proved remarkably stable over fifteen years, maintained through aggressive Power Usage Effectiveness (PUE) improvements. The stability was achieved, however, by optimising what sits inside the data centre, while the energy cost of what happens outside — traversing the network and rendering on the client — grew at an entirely different rate.

The early AI energy literature (Strubell et al., 2019; Patterson et al., 2021) correctly identified training costs as a major concern. As deployment scaled, Luccioni et al. (2023) conducted the first systematic inference energy measurement. Epoch AI (2025) synthesised available evidence to estimate ChatGPT at approximately 0.3 Wh per query, noting this was 'relatively pessimistic'.

2.2 The Emerging System-Level Perspective

The Green Software Foundation and related bodies have advocated for 'Software Carbon Intensity' metrics extending beyond the data centre. Morrison et al. (2025, ICLR) proposed holistic lifecycle evaluation of language model creation. The critical contribution of the present paper is to extend this system-level thinking across modalities — comparing LLM sessions against search sessions on a common functional-unit basis.

2.3 The Unexplored Gap

No published peer-reviewed study has, to our knowledge, quantitatively compared session-level energy for LLM versus search modalities while incorporating the programmatic advertising energy overhead. Scope3 (2023) documented advertising's campaign-level carbon footprint. Khan et al. (2024a, 2024b) measured ad-blocker impact on device power. These contributions have not been synthesised into a cross-modality CELCA using a common functional unit — this paper provides that synthesis.

§ 03

The Energy Physics of LLM Inference in 2025

3.1 The Production Benchmark: Google arXiv:2508.15734

The most rigorous publicly available production measurement was published by Google in August 2025 (Elsworth et al., arXiv:2508.15734). The paper measures a comprehensive stack including active TPU/GPU power (0.14 Wh, 58%), host CPU and DRAM (0.06 Wh, 25%), idle machine provisioning (0.02 Wh, 10%), and data-centre PUE overhead (0.02 Wh, 8%), yielding a median of 0.24 Wh per Gemini Apps text prompt.1

Note on scope: The 0.24 Wh figure anchors the efficient end of the distribution for commercially optimised, production-scale standard-model deployments. Complex prompts, multi-turn conversations, and reasoning-mode queries will sit substantially above this median.

3.2 Corroborating Independent Evidence

Epoch AI (February 2025) estimated approximately 0.30 Wh per ChatGPT query. In June 2025, OpenAI CEO Sam Altman disclosed 0.34 Wh for a standard text query — consistent with Epoch AI and modestly above Google's figure. This convergence from independent sources provides reasonable confidence that 0.2–0.4 Wh captures typical production inference as of 2025.

3.3 The Reasoning Model Tier (Out of Scope)

Reasoning models — OpenAI o1/o3, DeepSeek R1, Gemini Thinking — generate extended chain-of-thought sequences. Muxup (January 2026) estimated DeepSeek R1 at 0.96–3.74 Wh per query. This tier is explicitly out of scope; for the class of query most comparable to web search, standard models are both adequate and preferred.

3.4 Amortised Training Energy: A Calculated Inclusion

For a frontier model at 50 GWh training energy, deployed over two years serving 500 million queries/day:

Training energy 50,000,000 Wh
÷ (500M queries/day × 730 days) = 365B total queries
Amortised training cost per query 0.000137 Wh ≈ 0.14 mWh

This represents approximately 0.05% of operational inference energy — statistically negligible at the session level. Both training and search indexing costs are omitted on symmetric grounds.

§ 04

Anatomy of the Modern Search Session

4.1 Web Page Weight in 2025

The HTTP Archive 2025 Web Almanac documents the median mobile page reaching 2.56 MB, with the report noting that 'page size growth is accelerating, since October 2024 there has been a noticeable upward trend, in particular for mobile devices.'3 At the 90th percentile, pages reach approximately 6.9 MB on mobile.

A typical LLM synthesis response is a structured text payload of 2–10 KB. The network-transmission ratio between a 2.56 MB webpage and a 5 KB LLM response is approximately 500:1, before accounting for supplementary scripts, advertising payloads, and tracking pixels.

4.2 Mobile Network Energy Intensity

Nokia's engineering white paper on 5G energy efficiency measured a Finnish 4G network at 0.17 kWh/GB at representative average conditions.4 At this rate, downloading the median 2.56 MB mobile page consumes 0.44 Wh in network energy alone. A three-page search session carries 0.78–1.32 Wh in network energy, compared to effectively zero for a text-only LLM response.

4.3 Client Device Energy

Modern laptops draw 6–18 W during active browsing; flagship smartphones 2–4 W. The CHI 2025 experimental study by Spatharioti et al. found that LLM participants completed tasks more quickly with fewer queries than traditional search users,5 directly reducing total device energy through shorter screen-on time.

4.4 Zero-Click Asymmetry — and the Hidden Cost of AI-Augmented Search

Similarweb data from July 2025 reported that 69% of Google searches end without a click to any website.6 For these queries, search energy approximates the query cost alone (≈0.3 Wh), matching the LLM baseline. The efficiency advantage emerges for the ≈31% of queries requiring website visits.

Key insight: A growing share of SERP resolutions now occur via Google AI Overviews — which synthesise results using an LLM on top of the traditional search query. The canonical 0.3 Wh Google baseline therefore systematically understates the true energy cost of modern search for any query that triggers an Overview.7 Traditional search is quietly becoming Search + LLM Inference — making the pure-LLM model more competitive, not less, as search modernises.
§ 05

The Programmatic Advertising Energy Overhead

5.1 The Real-Time Bidding Mechanism

When a user lands on an ad-supported webpage, a programmatic auction initiates in parallel with content loading. The publisher's SSP broadcasts a Bid Request to dozens or hundreds of DSPs. Each DSP processes the request within a 100–300 ms deadline. Research has documented extreme cases of a single ad slot auctioned across thousands of intermediaries, with the vast majority of bid computations producing no output of value to the user.

5.2 The Quantified Client-Side Energy Tax

Khan et al. (2024a), published in the European Journal of Information Technologies and Computer Science, found that integrated ad-blockers such as Brave and LibreWolf reduced power consumption by up to 44% compared to conventional browsing, particularly on video-heavy and news sites. A companion study (Khan et al., 2024b) corroborated this with a 15% reduction across a broader browser comparison.

The implication: For the typical user on a typical content site, between 15% and 44% of device energy during a browsing session serves the advertising ecosystem rather than delivering informational content. An LLM interaction bypasses this overhead entirely.

5.3 Server-Side Ad-Tech Carbon Footprint

Scope3's Q1 2023 State of Sustainable Advertising report estimated 215,000 metric tonnes of CO₂ per month generated by programmatic advertising in five major economies. We do not apportion this server-side figure to individual page views, concentrating quantitative modelling on the directly measurable client-side ad-rendering burden.

5.4 Server-Side Ad-Tech Energy: A Quantified Estimate

The Ad Net Zero Global Media Sustainability Framework V1.2 (June 2025) now provides explicit formulas permitting quantitative allocation of server-side programmatic overhead. Using the framework's published defaults — server use-phase intensity of 3.41 × 10⁻⁷ kWh per ad opportunity, server factor 1.412, call factor 1.464, and average RTB payload of 3 KB — a standard ad-supported page with 3–5 ad slots generates approximately 0.05–0.12 Wh of server-side energy from RTB bidding alone. Adding creative delivery and network overhead brings the estimated total server-side ad-tech burden to 0.10–0.25 Wh per page load.

These figures are distinct from, and additive to, the client-side rendering overhead quantified in §5.2. For a three-page mobile search session (Scenario B), they represent a structural server-side overhead of approximately 0.30–0.75 Wh — energy entirely absent from an LLM session. Given the difficulty of precise allocation across the RTB supply chain, we adopt only the lower bound (0.30 Wh) in our central scenario estimates, treating this as a conservative floor. Complementary benchmarks from Scope3/Ebiquity (2025) report 0.67 gCO₂ per impression for display advertising, consistent with this order of magnitude at average EU grid intensity (~0.3 kgCO₂/kWh).

Note: These server-side costs do not appear in the session totals of Appendix A, which reflects only directly measurable client-side and network energy. Including the Ad Net Zero lower bound would increase the Scenario B search-session total from 2.48 Wh to approximately 2.78 Wh, widening the efficiency ratio from 5.5× to approximately 6.2×.

§ 06

Comparative Energy Lifecycle Assessment

6.1 Methodology and System Boundary

Included for both modalities: server-side computation (including data-centre PUE); core and last-mile network transmission; client-device energy during active task engagement; advertising payload rendering for search sessions. Excluded symmetrically: model training/index crawling (amortised — see §3.4); embodied carbon; idle device energy.

The assumption of 2–5 pages visited for complex synthesis tasks draws on three convergent 2025 sources: the CHI '25 randomised experiment (Spatharioti et al.) found participants in the traditional-search condition issued an average of 2.5 queries per task (95% CI [2.1, 3.0]); a December 2025 controlled study on high-involvement transactional searches (n = 52) recorded an average of 3.7 results consulted per session; and cross-industry benchmarks place research-oriented organic-search sessions at 5–7 pages per session (LuckyOrange, 2025; Databox, 2025). Our range (low: 2 / central: 3 / high: 5) is therefore conservative relative to observed behaviour.

Functional unit: the complete user session required to satisfy one complex information need, defined as a task requiring synthesis or comparison of information from multiple sources.

Scenario A

Simple Fact Query — Parity

"What is the current prime minister of Italy?" — single-answer factual query, SERP-resolved

▸ LLM Session
Inference .............. 0.24–0.34 Wh
Network (≈5 KB) .......... <0.001 Wh
Device (2 min × 2.5 W) .. 0.08–0.10 Wh
TOTAL .............. 0.32–0.44 Wh
≈ 1.1× Parity. Both modalities are energetically equivalent within measurement uncertainty. Note: if a Google AI Overview is triggered, search-session energy rises to an estimated 0.50 Wh.

Scenario B — Core Finding

Complex Synthesis Task on Mobile

"Compare the advantages and disadvantages of heat pumps versus gas boilers for a UK home, including installation cost, running cost, and government support schemes."

Search Session / Smartphone, 5G Mobile Data, Ad-Supported Pages

(1) Query processing 0.30 Wh
(2) Network: 3 pages × 2.56 MB = 7.68 MB × 0.15 kWh/GB 1.15 Wh
(3) Page rendering (CPU/GPU): 3 × 0.20 Wh 0.60 Wh
(4) Ad payload (30% of rendering, Khan et al. median) 0.18 Wh
(5) Device reading time: 6 min × 2.5 W 0.25 Wh
TOTAL SEARCH SESSION 2.48 Wh

LLM Session / Smartphone, Mobile Data

(1) Inference (extended synthesis response) 0.30–0.40 Wh
(2) Network: ≈5 KB text response <0.001 Wh
(3) Device reading time: 2.5 min × 2.5 W 0.10 Wh
TOTAL LLM SESSION 0.40–0.50 Wh
≈ 5.5× LLM session is approximately 5.5× more energy-efficient. Range: 4.2–7.1× across parameter bounds (see §7).

Scenario C — Upper Bound

Extended Research Session on Laptop

"Summarise the comparative energy policies of the EU and China for a policy briefing." Five pages visited across mixed Wi-Fi and mobile data.

▸ LLM Session (Laptop)
Inference ........... 0.40–0.60 Wh
Network .............. <0.001 Wh
Reading (4 min × 10 W) .. 0.67 Wh
TOTAL ........ 1.07–1.27 Wh
≈ 4.9× LLM session is approximately 4.9× more energy-efficient. Range: 3.8–6.5×.
§ 07

Sensitivity Analysis

7.1 Parameter Ranges

Parameter Low Central High Primary Source
LLM inference (standard) 0.15 Wh 0.30 Wh 0.55 Wh arXiv:2508.15734; Epoch AI; Altman (2025)
Search query energy 0.20 Wh 0.30 Wh 0.50 Wh Google (2009); Epoch AI (2025)
Mobile network intensity 0.10 kWh/GB 0.14 kWh/GB 0.17 kWh/GB Nokia WP; Andrae & Edler (2015)
Mobile page weight (median) 1.5 MB 2.56 MB 4.0 MB HTTP Archive Web Almanac 2025
Page rendering energy 0.10 Wh 0.20 Wh 0.45 Wh Pesari et al. (2023)
Ad payload (% of page energy) 15% 30% 44% Khan et al. (2024a, 2024b)
Pages per synthesis session 2 3 5 Spatharioti et al. CHI'25
Smartphone power draw 2.0 W 2.5 W 4.0 W Manufacturer specs
Task time saving (LLM vs search) 20% 40% 60% Spatharioti et al. (2025, CHI'25)

Table 1: Parameter estimates, uncertainty ranges, and primary sources for CELCA scenarios.

7.2 Monte Carlo Sensitivity Results (Scenario B)

Drawing 10,000 Monte Carlo samples across uniform distributions over the ranges in Table 1 for Scenario B:

Live Parameter Explorer
Tune the assumptions to see the energy shift. Scenario B (Complex Synthesis).
LLM Inference 0.30 Wh
Network Weight 2.5 MB
Network Intensity (Mobile) 0.15 kWh/GB
Ad Overhead (%) 30%
0.40 Wh
6.1×
Monte Carlo Sensitivity Analysis 10,000 iterations across all parameter bounds

The 0.3% of draws where search is more efficient are concentrated at: very low mobile network intensity (approaching Wi-Fi), ≤1 page visited, very low ad density, combined with reasoning-model inference costs. This is a narrow edge case.

7.3 The Wi-Fi Case

On fixed Wi-Fi (0.006 kWh/GB), the three-page search session network energy falls from 1.15 Wh to 0.046 Wh — nearly negligible. The LLM advantage narrows to approximately 1.5–2.5× for the median synthesis task on Wi-Fi, reaching parity for simple queries. Since ≈60% of global web traffic flows over cellular networks (GSMA 2025), the mobile scenario represents the majority of real-world usage.

§ 08

Behavioural Dynamics and the Time-on-Task Multiplier

Energy efficiency and time efficiency are coupled through device power draw. The CHI 2025 study by Spatharioti et al. used a randomised between-subjects design for product research tasks. Key findings: LLM participants completed tasks more quickly with fewer queries; the modal query count for LLM users was one versus two for search users; decision accuracy was comparable when LLM output was accurate.5

The 'pogo-sticking' behaviour documented in web usability research — clicking a result, finding it unsatisfactory, returning to the SERP, trying another — creates an energy penalty not captured in static page-count models. Each return-to-SERP adds approximately 0.30–0.60 Wh (mobile). LLM interfaces structurally eliminate this penalty by delivering a synthesised answer in a single interaction.

§ 09

Counter-Arguments: A Rigorous Interrogation

9.1 The Jevons Paradox

Making information retrieval cheaper will induce more demand. ChatGPT reached 800 million weekly active users by late 2025, with 2 billion daily queries. If this represents new demand rather than substituted demand, aggregate energy grows regardless of per-session efficiency gains.

The scope clarification here is essential: this paper evaluates unit efficiency for a defined task, not aggregate societal energy consumption. The Jevons paradox validates rather than refutes the unit-efficiency argument — demand rises because efficiency improves. Policy responses at the aggregate level are legitimate and complementary, not contradictory.

9.2 The Hallucination Verification Penalty

If users must verify LLM outputs with a follow-up search, the session energy becomes additive. Even in a hybrid workflow with one verification search, total energy typically remains below the unstructured multi-page session:

Hybrid: LLM inference + 1 search + 1 page load + reading
= 0.40 + 0.30 + 0.45 + 0.08 = 1.23 Wh
vs. 3-page search session = 2.48 Wh
Hybrid LLM advantage even with verification ≈ 2.0×

9.3 Scope Limitation: Agentic and Reasoning Workflows

The efficiency advantage applies specifically to standard non-reasoning LLM inference serving text synthesis queries on optimised commercial infrastructure. It does not apply to reasoning models (§3.3), agentic workflows combining LLM inference with programmatic web retrieval, image or video generation, or multi-turn conversations consuming reasoning tokens implicitly.

9.4 Asymmetric Embodied Carbon

GPU/TPU manufacturing (TSMC 3nm/4nm nodes) is energy-intensive. We flag this as a limitation and recommend a full Scope 3 lifecycle assessment for future work, noting that the web's continuously refreshed ad-tech server fleet also carries substantial embodied carbon.

9.5 Vendor Self-Interest in Energy Disclosures

The 0.24 Wh figure originates from a Google technical paper. We address this directly: the paper uses a comprehensive boundary that actually inflates the reported figure relative to hardware-only estimates (which would be ≈0.10 Wh). Independent estimates from Epoch AI (0.30 Wh) and Sam Altman's disclosure (0.34 Wh) bracket the Google figure from above. Even at 0.55 Wh — double the Google central estimate — the Scenario B efficiency advantage persists at approximately 3.2×.

§ 10

Policy Implications and Research Agenda

10.1 For Corporate Sustainability Officers

Organisations seeking to minimise their digital information-retrieval footprint should: (i) prioritise mobile-first LLM deployments for research and synthesis tasks over traditional search workflows on cellular connections; (ii) audit ad-tech footprint — browser-level ad blocking can reduce device energy by 15–44%; (iii) resist reasoning-model adoption for tasks that standard models handle adequately; (iv) incorporate session-level energy accounting into digital sustainability reporting.

10.2 For Regulators and Policy-Makers

The EU AI Act and emerging AI energy disclosure frameworks should carefully distinguish between aggregate supply-side energy demand (a legitimate large-scale concern) and per-task unit efficiency (where AI is frequently the lower-energy option). Regulatory frameworks that impose unit-energy taxes on LLM queries without considering the full-stack alternative-use-case energy risk creating perverse incentives.

10.3 Research Agenda

  1. Empirical hallucination rate data disaggregated by query type, with energy impact modelling for verification workflows.
  2. Independent, multi-provider inference energy benchmarks across production-realistic workloads with comprehensive system boundaries.
  3. Full Scope 3 lifecycle assessment for LLM and search infrastructure including embodied hardware carbon.
  4. Field measurement of cellular modem energy during LLM vs. search data payloads.
  5. Economic analysis of the content-creator/publisher externality: LLMs substituting for web visits reduce advertising revenue for publishers whose content trained the models.
§ 11

Conclusions

The 'AI energy crisis' is real at the level of data-centre infrastructure, grid load, and aggregate demand growth. It is not accurately described, however, by the claim that individual AI query interactions are systematically more energy-intensive than their web-search counterparts. For complex synthesis tasks performed on mobile devices, LLM sessions consume approximately 4–9× less energy than equivalent ad-supported web search sessions. This advantage is structurally driven by three compounding factors:

  • The high energy intensity of mobile cellular data transmission applied to the large payloads of modern webpages
  • The device energy overhead of the ad-tech supply chain consuming 15–44% of browsing power with zero informational value to the user
  • Reduced device screen-on time from faster task completion, validated experimentally at CHI 2025
Traditional web search is quietly becoming 'Search + LLM Inference' — making the 0.3 Wh baseline an increasingly outdated lower bound.

These advantages disappear or reverse on Wi-Fi for simple queries, for reasoning-model inference, and for agentic workflows. The Jevons paradox ensures that unit efficiency gains do not guarantee aggregate efficiency gains, and the rapid growth of AI query volume is a legitimate supply-side concern independent of unit efficiency.

The practical recommendation is precise: for individuals and institutions seeking to minimise the energy footprint of knowledge work on mobile devices, redirecting complex synthesis tasks to LLM interfaces represents a materially more efficient workflow than multi-page web search. This finding should inform corporate digital sustainability strategies, regulatory impact assessments of AI energy policy, and the emerging discipline of sustainable information retrieval.