When Google Research publishes on low-carbon computing with retired phones, most architects' instinct is to treat it as academic curiosity. I rarely make that mistake. After 16 years designing financial-grade platforms where every infrastructure decision carries capital cost, regulatory cost, and — increasingly, under ESG audits — carbon cost, I recognize the real technical signal here: the idea that hardware with already-paid embodied carbon can be repurposed as a substrate for heterogeneous edge compute, reducing both the marginal cost of capacity expansion and the operational carbon footprint. The problem is not the vision. The problem is the engineering that turns that vision into something that survives contact with production reality.
Embodied Carbon Is the Hidden Cost Most Architects Ignore
The sustainability metric that dominates cloud conversations is operational carbon — the emissions associated with energy consumption at runtime. But for hardware devices, especially smartphones, embodied carbon — emitted during manufacturing, materials mining, and logistics — represents between 70% and 80% of the device's total lifecycle emissions, according to lifecycle studies published by the semiconductor industry itself.
This changes the architectural calculus in a non-trivial way. When you provision a new c6g.xlarge EC2 instance for an edge workload, you are adding incremental demand that ultimately pressures the manufacturing of new chips. When you repurpose a retired smartphone that would otherwise go to recycling — or worse, landfill — you are amortizing an already-incurred carbon cost over more useful compute cycles.
In the context of financial platforms, this is starting to appear in Scope 3 carbon emissions reports, required by frameworks like TCFD and, in Brazil, by Central Bank guidelines in the context of climate risk. Architects who ignore this today will be redesigning systems tomorrow under regulatory pressure. I prefer to do the analysis now, calmly, rather than in a compliance sprint.
Edge Fleet Architecture with Retired Devices and AWS Orchestration
Full flow from retired device to cloud processing, showing orchestration, security, and observability layers.
📱 Edge Fleet — Retired Devices
- Retired Phone Greengrass Core v2 (edge)
- Retired Phone Local ML Inference (edge)
- Retired Phone Sensor Aggregator (edge)
🔒 Security Layer — Zero Trust Edge
- AWS IoT Core X.509 + mTLS (security)
- IoT Role Alias Least-privilege IAM (security)
🟧 AWS — Orchestration & Streaming
- Greengrass Fleet Provisioning + OTA (compute)
- Amazon MSK Kafka topic/partition (messaging)
- Lambda Event processor (compute)
📊 AWS — Observability & Storage
- CloudWatch SLO dashboards + alarms (data)
- S3 Raw telemetry + Parquet (storage)
- DynamoDB Device state + health (data)
Flows
- phone1 -> cert: mTLS auth
- phone2 -> cert: mTLS auth
- phone3 -> cert: mTLS auth
- cert -> iam_role: assume role via alias
- iam_role -> gg_cloud: authorization
- gg_cloud -> phone1: OTA component deploy
- phone1 -> msk: publishes telemetry
- phone2 -> msk: publishes inference
- msk -> lambda: partition trigger
- lambda -> s3: persists raw
- lambda -> dynamo: updates state
- lambda -> cw: custom metrics
- gg_cloud -> cw: fleet health
How It Actually Works: Greengrass v2, Device Identity, and the Heterogeneity Problem
The technical backbone of any heterogeneous edge fleet in the AWS ecosystem is AWS IoT Greengrass v2. Unlike version 1, Greengrass v2 adopts a decoupled component model — each function (local inference, telemetry collection, shadow sync) is an independent component with its own lifecycle, restart policy, and dependencies declared in a recipe manifest. This is critical for retired smartphones because you do not control the hardware: a Snapdragon 660 and an Exynos 9810 have completely different memory capabilities, ML acceleration, and battery consumption profiles.
Device identity provisioning is where most IoT projects fail at scale. For a fleet of retired devices, the correct mechanism is Fleet Provisioning via JITP (Just-In-Time Provisioning) with provisioning templates that issue unique X.509 certificates per device, associated with an IoT Role Alias that maps to an IAM Role with minimum permissions — only iot:Publish on device-specific topics, greengrass:GetDeployment, and s3:GetObject on the component artifact bucket.
The critical failure point here is certificate rotation. Edge devices with intermittent connectivity — and retired smartphones in locations with unstable Wi-Fi are the typical case — can miss the rotation window. You need a renewal policy with a validity overlap of at least 30 days and an automatic re-provisioning mechanism via MQTT retained message when the device reconnects after a prolonged offline period.
The Real Cost Model: Carbon vs. Reliability vs. Operations: A retired smartphone has near-zero hardware cost, but the operational cost of managing a heterogeneous edge node — provisioning, monitoring, firmware updates, degraded battery replacement — can easily exceed the cost of a Graviton3 EC2 instance over 12 months. The real break-even depends on three variables: device density per operations engineer, hardware failure frequency, and the value of reduced local latency. For financial workloads where sub-50ms inference latency is a business requirement, edge justifies the operational cost. For batch processing that tolerates 500ms+, it does not.
Failure Modes That Lab Experiments Do Not Reveal
The difference between a research paper and a production platform is the list of failure modes that only appear at scale and over time. For retired device fleets, I have identified four critical categories:
1. Non-linear battery degradation. Smartphones with 500+ charge cycles have battery capacity reduced by 20-40%. Under continuous compute load, this means unexpected shutdowns in 4-6 hours instead of 10-12. Greengrass v2 has no native battery state visibility — you need a custom component that reads /sys/class/power_supply/battery/capacity and publishes to an MQTT health topic, with a CloudWatch alarm when battery_level < 20% for more than 15 minutes.
2. Silent thermal throttling. Mobile SoCs automatically reduce clock speed under elevated temperature. An inference that takes 45ms at 25°C may take 120ms at 42°C — with no error, just degraded latency. Without temperature instrumentation via OpenTelemetry exporting to CloudWatch custom metrics, you will never correlate this behavior with peak usage hours.
3. OS and API fragmentation. Android 8, 9, 10, and 11 have different behaviors for background process limits, Doze mode, and network access restrictions. A Greengrass component that works perfectly on Android 11 may be killed by the system on Android 8 after 10 minutes in the background. The solution is to use a Foreground Service with a persistent notification — ugly, but necessary.
4. Clock drift on offline devices. Devices without NTP synchronized for more than 2 hours can have clock drift of 30-60 seconds. For financial telemetry data where the timestamp is part of the audit record, this is unacceptable. The solution is server-side timestamp (MSK broker time) as the authoritative source, with the device timestamp as secondary metadata.
Real Numbers: What to Expect from a Retired Device Fleet
- ~75% — Embodied carbon reduction vs. new hardware. Amortization of carbon already emitted in manufacturing; valid only if the device would replace equivalent new hardware
- 3-5x — Operational cost increase vs. equivalent EC2 instance. Includes engineering time for provisioning, monitoring, and replacing failed devices in small fleets (<500 nodes)
- <50ms — Local inference latency on modern mobile SoCs (Snapdragon 865+). For INT8 quantized models with NNAPI acceleration; degrades to 80-150ms on 2017-2018 SoCs under thermal load
Fleet Orchestration at Scale: Deployment, OTA, and Kafka Partitioning
For a fleet with hundreds or thousands of retired devices, the Greengrass component deployment strategy needs to be treated with the same rigor as a microservice deployment on EKS. Greengrass v2 supports deployment groups based on device attributes (thing attributes in the IoT Registry), which allows canary deployments — for example, first on devices with Android 11 and battery above 80%, then progressively expanding.
The OTA pipeline needs to account for the connectivity window. Devices in locations with intermittent Wi-Fi should receive updates during low-traffic hours, configured via IoT Jobs with schedulingConfig.startTime and timeoutConfig. The component artifact in S3 should use S3 Transfer Acceleration for geographically distributed devices, and the bucket should have a lifecycle policy that expires old versions after 90 days for cost control.
On the streaming side, MSK partitioning is the most impactful architectural decision. Using device_id as the partition key guarantees per-device ordering but creates hot partitions if the distribution of active devices is uneven — which is guaranteed in a heterogeneous fleet where some devices go offline for hours. The solution I use is a composite key {region}#{device_tier} where device_tier is calculated by device capability (SoC generation, RAM), distributing load more evenly. The number of partitions should be sized at max_throughput / 1MB/s per partition, with 30% headroom for fleet reconnection spikes after a network outage.
For idempotency in the Lambda consumer, the event_id should be generated on the device as SHA256(device_id + timestamp_ms + sequence_number) and checked against a DynamoDB table with a 24-hour TTL before processing — the cost of 2 DynamoDB reads per event is insignificant compared to the cost of processing duplicates in financial systems.
Security in a Heterogeneous Fleet: Zero Trust Is Not Optional
Retired devices introduce an attack surface that new corporate hardware does not have: unknown usage history, potential pre-installed malicious software, and absence of Secure Enclave on older models. For financial environments, this requires a Zero Trust posture more rigorous than most IoT deployments.
The model I implement has three layers. At the identity layer, each device receives a unique X.509 certificate issued by the private AWS IoT CA, with a 365-day validity and automatic rotation via Lambda triggered by EventBridge Scheduler. The associated IoT Policy uses iot:Connection.Thing.ThingName conditions to ensure the certificate can only be used by the device for which it was issued — this prevents reuse of compromised certificates.
At the data layer, all telemetry in transit uses mandatory TLS 1.3 (configured in the IoT Core endpoint policy). Data at rest in S3 uses SSE-KMS with a dedicated CMK per environment, with annual automatic rotation. DynamoDB uses CMK encryption and the device state table has a resource-based policy that denies access to any principal outside the specific AWS account.
At the detection layer, AWS IoT Device Defender is configured with audit checks for expired certificates, IoT policies with excessive wildcards, and connections from anomalous IPs. Findings are routed via EventBridge to an SQS queue consumed by a Lambda that updates device status in DynamoDB and, if severity is CRITICAL, revokes the certificate via iot:UpdateCertificate with status REVOKED. This automatic response loop is what differentiates a secure fleet from a merely monitored fleet.
Anti-Patterns That Destroy Retired Device Fleets in Production
- Shared certificate across devices: using a single certificate for the entire fleet eliminates the ability to individually revoke a compromised device and violates the principle of least privilege in IoT.
- Ignoring Android Doze Mode: Greengrass components without a Foreground Service on Android 6+ are suspended by the system after 30-60 minutes of screen inactivity, causing silent gaps in telemetry collection.
- Using device timestamp as authoritative source: edge device clocks drift; in financial systems, the Kafka broker timestamp (server-side) must be the source of truth for ordering and auditing.
- Sizing Kafka partitions by device count, not throughput: a fleet of 1000 devices sending 1 event/minute each does not justify 1000 partitions — the consumer group coordination overhead exceeds the parallelism benefit.
-
OTA deployments without canary and without automatic rollback: on heterogeneous hardware, a component that passes tests on 5 devices may fail on 20% of the fleet due to SoC differences. Without automatic rollback via IoT Jobs
abortConfig, you can bring down hundreds of nodes simultaneously. -
Treating sustainability as marketing without real carbon instrumentation: without per-device energy consumption metrics (via
/sys/class/power_supply) and emission calculation based on the local grid emission factor, the 'low-carbon' narrative is unauditable and indefensible in ESG reports.
Assessment Against AWS Well-Architected Pillars
- security: Zero Trust mandatory: per-device X.509 certificates via Fleet Provisioning, IoT Role Alias with least-privilege IAM, Device Defender with automatic revocation response, SSE-KMS on all data at rest. Residual risk: devices with outdated OS without security patches — mitigate with minimum Android version policy in the provisioning template.
- reliability: Design for device failure as a normal event, not an exception: health checks via MQTT LWT (Last Will and Testament) with 90-second timeout, exponential reconnect with jitter in the Greengrass client, idempotency in the Kafka consumer via DynamoDB dedup table. Fleet availability SLO should be defined by percentile (e.g., 95% of devices active in any 1-hour window), not by individual availability.
- performance: Local inference only for models that justify the latency: use NNAPI for acceleration on SoCs with DSP/NPU (Snapdragon 845+), TFLite CPU fallback for older devices. Monitor p99 inference latency per model and per SoC via CloudWatch custom metrics — p99 is the number that matters in financial systems, not the average.
- sustainability: Instrument per-device energy consumption and calculate emissions using the local grid emission factor (available via US EPA API or regional equivalents). Publish carbon metrics to CloudWatch and include in FinOps dashboards. Define decommissioning criteria based on energy efficiency: devices with per-operation consumption above threshold should be replaced, even if still functional.
Retired Devices vs. Edge Compute Alternatives
| Criterion | Dimension | Retired Smartphones | AWS Wavelength / Outposts | Raspberry Pi / ARM SBC |
|---|---|---|---|---|
| Hardware cost | ~$0 (already paid) | High (CAPEX or AWS OPEX) | Low ($35-$80) | — |
| Operational cost | High (heterogeneity) | Low (AWS managed) | Medium (homogeneous) | — |
| Embodied carbon | Zero marginal (already emitted) | High (new hardware) | Medium (new hardware) | — |
| Local ML capability | High (NPU/DSP on modern SoCs) | Very high (dedicated GPU) | Low (no accelerator) | — |
| Financial/regulatory fit | Medium (requires rigorous Zero Trust) | High (AWS SLA, native compliance) | Medium (depends on hardening) | — |
My Perspective: When I Would Use This in Production: I would deploy this architecture in production in exactly two scenarios: (1) when sub-50ms local inference latency is a non-negotiable business requirement and the cost of AWS Wavelength or Outposts is prohibitive, and (2) when the organization has an auditable ESG mandate requiring embodied carbon reduction and has the devices available internally — for example, a bank refreshing its corporate smartphone fleet. The hardest lesson I have learned in large-scale IoT projects is that heterogeneous fleet operations cost is systematically underestimated by 3-5x in planning phases; automate provisioning, remediation, and decommissioning from day zero, or the operational cost will consume all the carbon and hardware savings within 18 months. And never, ever, use device timestamp as the authoritative source in financial systems.
Verdict: Real Potential, Non-Trivial Operational Complexity
Low-carbon computing with retired devices is an architecturally legitimate idea, not an academic experiment — but it only justifies itself when the total cost analysis includes the operational cost of managing heterogeneous hardware, the security cost of an expanded attack surface, and the engineering cost of instrumenting carbon in an auditable way. For financial organizations with growing ESG mandates and internally available device fleets, the combination of AWS IoT Greengrass v2, Fleet Provisioning, IoT Device Defender, and MSK with careful partitioning offers a solid technical foundation. The practical recommendation: start with a pilot of 50-100 homogeneous devices (same SoC generation, same OS), instrument everything from the start with OpenTelemetry and CloudWatch custom metrics, define fleet SLOs by percentile, and only scale to heterogeneous hardware after the operational playbook has been validated. Real sustainability requires rigorous engineering, not just good intentions.
References and Further Reading
- AWS IoT Greengrass v2 Developer Guide
- AWS IoT Fleet Provisioning
- AWS IoT Device Defender
- Amazon MSK Best Practices
- AWS Well-Architected Sustainability Pillar
- Google Research: Low-carbon computing with retired phones
- Greening the Beast: Lifecycle Carbon in Mobile Devices (ACM)
- OpenTelemetry for IoT and Edge
Originally published at fernando.moretes.com. By Fernando F. Azevedo — Senior Solutions Architect.







