What I Learned Building Vision Models on Edge Devices
February 15, 20263 min read

What I Learned Building Vision Models on Edge Devices

Trade-offs between accuracy, latency, and power while shipping computer vision features on constrained hardware.

Computer Vision
IoT

When you're running a computer vision model on a Raspberry Pi or an ESP32-CAM, everything you thought you knew about model development changes. There's no GPU. There's limited RAM. And your users expect real-time results.

Here's what I've learned shipping vision models on edge hardware across three different projects.

The Constraint Mindset

On the cloud, you optimize for accuracy first, then worry about cost. On the edge, you optimize for three things simultaneously:

Edge Device Constraints Triangle

  • Latency — Can you process a frame before the next one arrives?
  • Power — Will the battery last a full day?
  • Accuracy — Is the output useful enough to act on?

You can't maximize all three. Every project requires choosing which one to sacrifice slightly.

Model Selection: Smaller Is Smarter

I stopped reaching for ResNet and started with MobileNetV3. The accuracy difference on my real-world datasets was negligible (< 2%), but the inference time dropped by 5x.

import tensorflow as tf

# Instead of this
model = tf.keras.applications.ResNet50(weights="imagenet")

# Use this
model = tf.keras.applications.MobileNetV3Small(
    weights="imagenet",
    input_shape=(224, 224, 3)
)

For even more constrained devices, I use TensorFlow Lite with post-training quantization:

converter = tf.lite.TFLiteConverter.from_saved_model("model/")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
tflite_model = converter.convert()

This cuts model size by 50-75% with minimal accuracy loss.

The Preprocessing Bottleneck

Here's something nobody tells you: on edge devices, image preprocessing often takes longer than inference. Resizing, normalizing, and color-converting a 1080p frame can take 40ms on a Pi — which eats into your entire frame budget.

Solutions that worked for me:

  1. Capture at target resolution — Don't capture 1080p and resize to 224x224. Set the camera to capture at 320x240.
  2. Skip frames — Process every 3rd frame. For most applications, 10 FPS is plenty.
  3. Use hardware acceleration — V4L2 on Linux, camera HAL on Android.

Real Project: Plant Disease Detection

For Rudraksh (my plant disease detection platform), I needed to run a classification model on a Raspberry Pi 4 in the field — no internet connection, battery-powered.

The architecture:

Camera → Capture (320x240) → Preprocess (numpy) → TFLite Inference → Result → LCD Display

Key numbers:

  • Model size: 2.3 MB (quantized MobileNetV3)
  • Inference time: 85ms per frame
  • Accuracy: 94.2% on 38 disease classes
  • Battery life: 6 hours on a 10,000 mAh bank

The biggest win was caching: if the previous 3 frames gave the same prediction with >90% confidence, skip inference and reuse the result. This cut average power consumption by 40%.

Lessons

  1. Profile before optimizing — I spent a week optimizing the model only to discover preprocessing was the bottleneck.
  2. Test on real hardware early — Emulators lie about performance.
  3. Ship the simplest model that works — You can always upgrade later.
  4. Design for offline — Edge means unreliable connectivity. Cache everything.

Edge ML is one of the most satisfying engineering challenges I've encountered. The constraints force creativity, and the results are tangible — literally running in your hands.