Introduction and Motivation
In 2023, a widely circulated comparison claimed that generating a single response from a large language model consumed ten times more energy than a Google search query. The claim was technically narrow — it compared server-side GPU computation for an unoptimised, low-utilisation research deployment against a decade-mature search stack — but it lodged in public consciousness, shaped ESG discourse, and influenced early regulatory thinking on both sides of the Atlantic.
This paper makes a different comparison. Rather than asking 'how much energy does a server consume to answer one query?', we ask: 'how much energy does a user consume to satisfy one complex information need?' This reframing — from server-side computation to full-stack session — changes the answer substantially.
it provides a map to information hosted elsewhere.
The energy cost of navigating that map — downloading pages, rendering JavaScript, processing advertisement auctions, and spending time reading — is borne by the user's device, the telecommunications network, and a largely invisible ad-tech infrastructure. None of these costs appear on the data centre's meter.
The past eighteen months have also transformed the empirical landscape. Google published a peer-reviewed technical paper (arXiv:2508.15734) documenting that the median Gemini text prompt consumed 0.24 Wh in May 2025 — a 33-fold reduction from the same measure twelve months prior.2 OpenAI's CEO disclosed a comparable 0.34 Wh for ChatGPT. Meanwhile, the HTTP Archive 2025 Web Almanac recorded the median mobile page at 2.56 MB — a figure that, transmitted over a 4G network at Nokia's measured 0.17 kWh/GB, costs more in network energy alone than the entire LLM inference, before the user's device draws a single watt.
Related Work and Analytical Gap
2.1 The Server-Centric Measurement Tradition
The benchmark for search-engine energy was established in 2009, when Google disclosed that one query consumed approximately 0.3 Wh, including indexing and retrieval. This figure proved remarkably stable over fifteen years, maintained through aggressive Power Usage Effectiveness (PUE) improvements. The stability was achieved, however, by optimising what sits inside the data centre, while the energy cost of what happens outside — traversing the network and rendering on the client — grew at an entirely different rate.
The early AI energy literature (Strubell et al., 2019; Patterson et al., 2021) correctly identified training costs as a major concern. As deployment scaled, Luccioni et al. (2023) conducted the first systematic inference energy measurement. Epoch AI (2025) synthesised available evidence to estimate ChatGPT at approximately 0.3 Wh per query, noting this was 'relatively pessimistic'.
2.2 The Emerging System-Level Perspective
The Green Software Foundation and related bodies have advocated for 'Software Carbon Intensity' metrics extending beyond the data centre. Morrison et al. (2025, ICLR) proposed holistic lifecycle evaluation of language model creation. The critical contribution of the present paper is to extend this system-level thinking across modalities — comparing LLM sessions against search sessions on a common functional-unit basis.
2.3 The Unexplored Gap
No published peer-reviewed study has, to our knowledge, quantitatively compared session-level energy for LLM versus search modalities while incorporating the programmatic advertising energy overhead. Scope3 (2023) documented advertising's campaign-level carbon footprint. Khan et al. (2024a, 2024b) measured ad-blocker impact on device power. These contributions have not been synthesised into a cross-modality CELCA using a common functional unit — this paper provides that synthesis.
The Energy Physics of LLM Inference in 2025
3.1 The Production Benchmark: Google arXiv:2508.15734
The most rigorous publicly available production measurement was published by Google in August 2025 (Elsworth et al., arXiv:2508.15734). The paper measures a comprehensive stack including active TPU/GPU power (0.14 Wh, 58%), host CPU and DRAM (0.06 Wh, 25%), idle machine provisioning (0.02 Wh, 10%), and data-centre PUE overhead (0.02 Wh, 8%), yielding a median of 0.24 Wh per Gemini Apps text prompt.1
3.2 Corroborating Independent Evidence
Epoch AI (February 2025) estimated approximately 0.30 Wh per ChatGPT query. In June 2025, OpenAI CEO Sam Altman disclosed 0.34 Wh for a standard text query — consistent with Epoch AI and modestly above Google's figure. This convergence from independent sources provides reasonable confidence that 0.2–0.4 Wh captures typical production inference as of 2025.
3.3 The Reasoning Model Tier (Out of Scope)
Reasoning models — OpenAI o1/o3, DeepSeek R1, Gemini Thinking — generate extended chain-of-thought sequences. Muxup (January 2026) estimated DeepSeek R1 at 0.96–3.74 Wh per query. This tier is explicitly out of scope; for the class of query most comparable to web search, standard models are both adequate and preferred.
3.4 Amortised Training Energy: A Calculated Inclusion
For a frontier model at 50 GWh training energy, deployed over two years serving 500 million queries/day:
This represents approximately 0.05% of operational inference energy — statistically negligible at the session level. Both training and search indexing costs are omitted on symmetric grounds.
Anatomy of the Modern Search Session
4.1 Web Page Weight in 2025
The HTTP Archive 2025 Web Almanac documents the median mobile page reaching 2.56 MB, with the report noting that 'page size growth is accelerating, since October 2024 there has been a noticeable upward trend, in particular for mobile devices.'3 At the 90th percentile, pages reach approximately 6.9 MB on mobile.
A typical LLM synthesis response is a structured text payload of 2–10 KB. The network-transmission ratio between a 2.56 MB webpage and a 5 KB LLM response is approximately 500:1, before accounting for supplementary scripts, advertising payloads, and tracking pixels.
4.2 Mobile Network Energy Intensity
Nokia's engineering white paper on 5G energy efficiency measured a Finnish 4G network at 0.17 kWh/GB at representative average conditions.4 At this rate, downloading the median 2.56 MB mobile page consumes 0.44 Wh in network energy alone. A three-page search session carries 0.78–1.32 Wh in network energy, compared to effectively zero for a text-only LLM response.
4.3 Client Device Energy
Modern laptops draw 6–18 W during active browsing; flagship smartphones 2–4 W. The CHI 2025 experimental study by Spatharioti et al. found that LLM participants completed tasks more quickly with fewer queries than traditional search users,5 directly reducing total device energy through shorter screen-on time.
4.4 Zero-Click Asymmetry — and the Hidden Cost of AI-Augmented Search
Similarweb data from July 2025 reported that 69% of Google searches end without a click to any website.6 For these queries, search energy approximates the query cost alone (≈0.3 Wh), matching the LLM baseline. The efficiency advantage emerges for the ≈31% of queries requiring website visits.
The Programmatic Advertising Energy Overhead
5.1 The Real-Time Bidding Mechanism
When a user lands on an ad-supported webpage, a programmatic auction initiates in parallel with content loading. The publisher's SSP broadcasts a Bid Request to dozens or hundreds of DSPs. Each DSP processes the request within a 100–300 ms deadline. Research has documented extreme cases of a single ad slot auctioned across thousands of intermediaries, with the vast majority of bid computations producing no output of value to the user.
5.2 The Quantified Client-Side Energy Tax
Khan et al. (2024a), published in the European Journal of Information Technologies and Computer Science, found that integrated ad-blockers such as Brave and LibreWolf reduced power consumption by up to 44% compared to conventional browsing, particularly on video-heavy and news sites. A companion study (Khan et al., 2024b) corroborated this with a 15% reduction across a broader browser comparison.
5.3 Server-Side Ad-Tech Carbon Footprint
Scope3's Q1 2023 State of Sustainable Advertising report estimated 215,000 metric tonnes of CO₂ per month generated by programmatic advertising in five major economies. We do not apportion this server-side figure to individual page views, concentrating quantitative modelling on the directly measurable client-side ad-rendering burden.
5.4 Server-Side Ad-Tech Energy: A Quantified Estimate
The Ad Net Zero Global Media Sustainability Framework V1.2 (June 2025) now provides explicit formulas permitting quantitative allocation of server-side programmatic overhead. Using the framework's published defaults — server use-phase intensity of 3.41 × 10⁻⁷ kWh per ad opportunity, server factor 1.412, call factor 1.464, and average RTB payload of 3 KB — a standard ad-supported page with 3–5 ad slots generates approximately 0.05–0.12 Wh of server-side energy from RTB bidding alone. Adding creative delivery and network overhead brings the estimated total server-side ad-tech burden to 0.10–0.25 Wh per page load.
These figures are distinct from, and additive to, the client-side rendering overhead quantified in §5.2. For a three-page mobile search session (Scenario B), they represent a structural server-side overhead of approximately 0.30–0.75 Wh — energy entirely absent from an LLM session. Given the difficulty of precise allocation across the RTB supply chain, we adopt only the lower bound (0.30 Wh) in our central scenario estimates, treating this as a conservative floor. Complementary benchmarks from Scope3/Ebiquity (2025) report 0.67 gCO₂ per impression for display advertising, consistent with this order of magnitude at average EU grid intensity (~0.3 kgCO₂/kWh).
Note: These server-side costs do not appear in the session totals of Appendix A, which reflects only directly measurable client-side and network energy. Including the Ad Net Zero lower bound would increase the Scenario B search-session total from 2.48 Wh to approximately 2.78 Wh, widening the efficiency ratio from 5.5× to approximately 6.2×.
Comparative Energy Lifecycle Assessment
6.1 Methodology and System Boundary
Included for both modalities: server-side computation (including data-centre PUE); core and last-mile network transmission; client-device energy during active task engagement; advertising payload rendering for search sessions. Excluded symmetrically: model training/index crawling (amortised — see §3.4); embodied carbon; idle device energy.
The assumption of 2–5 pages visited for complex synthesis tasks draws on three convergent 2025 sources: the CHI '25 randomised experiment (Spatharioti et al.) found participants in the traditional-search condition issued an average of 2.5 queries per task (95% CI [2.1, 3.0]); a December 2025 controlled study on high-involvement transactional searches (n = 52) recorded an average of 3.7 results consulted per session; and cross-industry benchmarks place research-oriented organic-search sessions at 5–7 pages per session (LuckyOrange, 2025; Databox, 2025). Our range (low: 2 / central: 3 / high: 5) is therefore conservative relative to observed behaviour.
Functional unit: the complete user session required to satisfy one complex information need, defined as a task requiring synthesis or comparison of information from multiple sources.
Search Session / Smartphone, 5G Mobile Data, Ad-Supported Pages
LLM Session / Smartphone, Mobile Data
Sensitivity Analysis
7.1 Parameter Ranges
| Parameter | Low | Central | High | Primary Source |
|---|---|---|---|---|
| LLM inference (standard) | 0.15 Wh | 0.30 Wh | 0.55 Wh | arXiv:2508.15734; Epoch AI; Altman (2025) |
| Search query energy | 0.20 Wh | 0.30 Wh | 0.50 Wh | Google (2009); Epoch AI (2025) |
| Mobile network intensity | 0.10 kWh/GB | 0.14 kWh/GB | 0.17 kWh/GB | Nokia WP; Andrae & Edler (2015) |
| Mobile page weight (median) | 1.5 MB | 2.56 MB | 4.0 MB | HTTP Archive Web Almanac 2025 |
| Page rendering energy | 0.10 Wh | 0.20 Wh | 0.45 Wh | Pesari et al. (2023) |
| Ad payload (% of page energy) | 15% | 30% | 44% | Khan et al. (2024a, 2024b) |
| Pages per synthesis session | 2 | 3 | 5 | Spatharioti et al. CHI'25 |
| Smartphone power draw | 2.0 W | 2.5 W | 4.0 W | Manufacturer specs |
| Task time saving (LLM vs search) | 20% | 40% | 60% | Spatharioti et al. (2025, CHI'25) |
Table 1: Parameter estimates, uncertainty ranges, and primary sources for CELCA scenarios.
7.2 Monte Carlo Sensitivity Results (Scenario B)
Drawing 10,000 Monte Carlo samples across uniform distributions over the ranges in Table 1 for Scenario B:
The 0.3% of draws where search is more efficient are concentrated at: very low mobile network intensity (approaching Wi-Fi), ≤1 page visited, very low ad density, combined with reasoning-model inference costs. This is a narrow edge case.
7.3 The Wi-Fi Case
On fixed Wi-Fi (0.006 kWh/GB), the three-page search session network energy falls from 1.15 Wh to 0.046 Wh — nearly negligible. The LLM advantage narrows to approximately 1.5–2.5× for the median synthesis task on Wi-Fi, reaching parity for simple queries. Since ≈60% of global web traffic flows over cellular networks (GSMA 2025), the mobile scenario represents the majority of real-world usage.
Behavioural Dynamics and the Time-on-Task Multiplier
Energy efficiency and time efficiency are coupled through device power draw. The CHI 2025 study by Spatharioti et al. used a randomised between-subjects design for product research tasks. Key findings: LLM participants completed tasks more quickly with fewer queries; the modal query count for LLM users was one versus two for search users; decision accuracy was comparable when LLM output was accurate.5
The 'pogo-sticking' behaviour documented in web usability research — clicking a result, finding it unsatisfactory, returning to the SERP, trying another — creates an energy penalty not captured in static page-count models. Each return-to-SERP adds approximately 0.30–0.60 Wh (mobile). LLM interfaces structurally eliminate this penalty by delivering a synthesised answer in a single interaction.
Counter-Arguments: A Rigorous Interrogation
9.1 The Jevons Paradox
Making information retrieval cheaper will induce more demand. ChatGPT reached 800 million weekly active users by late 2025, with 2 billion daily queries. If this represents new demand rather than substituted demand, aggregate energy grows regardless of per-session efficiency gains.
The scope clarification here is essential: this paper evaluates unit efficiency for a defined task, not aggregate societal energy consumption. The Jevons paradox validates rather than refutes the unit-efficiency argument — demand rises because efficiency improves. Policy responses at the aggregate level are legitimate and complementary, not contradictory.
9.2 The Hallucination Verification Penalty
If users must verify LLM outputs with a follow-up search, the session energy becomes additive. Even in a hybrid workflow with one verification search, total energy typically remains below the unstructured multi-page session:
9.3 Scope Limitation: Agentic and Reasoning Workflows
The efficiency advantage applies specifically to standard non-reasoning LLM inference serving text synthesis queries on optimised commercial infrastructure. It does not apply to reasoning models (§3.3), agentic workflows combining LLM inference with programmatic web retrieval, image or video generation, or multi-turn conversations consuming reasoning tokens implicitly.
9.4 Asymmetric Embodied Carbon
GPU/TPU manufacturing (TSMC 3nm/4nm nodes) is energy-intensive. We flag this as a limitation and recommend a full Scope 3 lifecycle assessment for future work, noting that the web's continuously refreshed ad-tech server fleet also carries substantial embodied carbon.
9.5 Vendor Self-Interest in Energy Disclosures
The 0.24 Wh figure originates from a Google technical paper. We address this directly: the paper uses a comprehensive boundary that actually inflates the reported figure relative to hardware-only estimates (which would be ≈0.10 Wh). Independent estimates from Epoch AI (0.30 Wh) and Sam Altman's disclosure (0.34 Wh) bracket the Google figure from above. Even at 0.55 Wh — double the Google central estimate — the Scenario B efficiency advantage persists at approximately 3.2×.
Policy Implications and Research Agenda
10.1 For Corporate Sustainability Officers
Organisations seeking to minimise their digital information-retrieval footprint should: (i) prioritise mobile-first LLM deployments for research and synthesis tasks over traditional search workflows on cellular connections; (ii) audit ad-tech footprint — browser-level ad blocking can reduce device energy by 15–44%; (iii) resist reasoning-model adoption for tasks that standard models handle adequately; (iv) incorporate session-level energy accounting into digital sustainability reporting.
10.2 For Regulators and Policy-Makers
The EU AI Act and emerging AI energy disclosure frameworks should carefully distinguish between aggregate supply-side energy demand (a legitimate large-scale concern) and per-task unit efficiency (where AI is frequently the lower-energy option). Regulatory frameworks that impose unit-energy taxes on LLM queries without considering the full-stack alternative-use-case energy risk creating perverse incentives.
10.3 Research Agenda
- Empirical hallucination rate data disaggregated by query type, with energy impact modelling for verification workflows.
- Independent, multi-provider inference energy benchmarks across production-realistic workloads with comprehensive system boundaries.
- Full Scope 3 lifecycle assessment for LLM and search infrastructure including embodied hardware carbon.
- Field measurement of cellular modem energy during LLM vs. search data payloads.
- Economic analysis of the content-creator/publisher externality: LLMs substituting for web visits reduce advertising revenue for publishers whose content trained the models.
Conclusions
The 'AI energy crisis' is real at the level of data-centre infrastructure, grid load, and aggregate demand growth. It is not accurately described, however, by the claim that individual AI query interactions are systematically more energy-intensive than their web-search counterparts. For complex synthesis tasks performed on mobile devices, LLM sessions consume approximately 4–9× less energy than equivalent ad-supported web search sessions. This advantage is structurally driven by three compounding factors:
- The high energy intensity of mobile cellular data transmission applied to the large payloads of modern webpages
- The device energy overhead of the ad-tech supply chain consuming 15–44% of browsing power with zero informational value to the user
- Reduced device screen-on time from faster task completion, validated experimentally at CHI 2025
These advantages disappear or reverse on Wi-Fi for simple queries, for reasoning-model inference, and for agentic workflows. The Jevons paradox ensures that unit efficiency gains do not guarantee aggregate efficiency gains, and the rapid growth of AI query volume is a legitimate supply-side concern independent of unit efficiency.
The practical recommendation is precise: for individuals and institutions seeking to minimise the energy footprint of knowledge work on mobile devices, redirecting complex synthesis tasks to LLM interfaces represents a materially more efficient workflow than multi-page web search. This finding should inform corporate digital sustainability strategies, regulatory impact assessments of AI energy policy, and the emerging discipline of sustainable information retrieval.