The default architecture for video analytics in 2024 is to stream video to the cloud, run inference on cloud compute, and return results to the operator. This architecture is operationally simple and technically straightforward. It is also wrong for almost every context in which it is deployed.
The argument for cloud inference is cost and flexibility — GPU compute is cheaper in the cloud, models can be updated centrally, and the edge device does not need to be powerful. These arguments are real. They are also secondary to the arguments against sending raw video off-site, which are structural and not solvable by pricing or operational convenience.
What happens when video leaves the building
Raw video of a commercial space contains individually identifiable information. Every person who passes through the frame is potentially identifiable — by face, by gait, by clothing, by their companions. Video footage is personal data under every privacy framework that currently exists or is being developed in MENA markets.
When that video leaves the building and travels to a cloud provider, several things happen simultaneously. The operator creates a data transfer that is, in most jurisdictions, a data processing activity requiring documented legal basis. They create a potential breach surface — the video is now accessible to anyone who can reach the cloud endpoint. They create a data residency question — cloud providers route traffic and store data across jurisdictions in ways that are often opaque to their customers. And they create a regulatory exposure that compounds over time as MENA frameworks tighten.
None of these problems are solved by encrypting the transfer. Encrypted personal data in transit is still personal data. The downstream obligations — breach notification, data subject rights, regulatory audit — apply regardless of whether the data was encrypted when it moved.
Why edge inference solves the problem structurally
Edge inference runs the computer vision model on a device physically located inside the site — on the same LAN as the cameras, behind the same network boundary. The video frame is captured by the camera, transmitted over the local network to the edge appliance, processed by the inference model, and the result — a count, an event, a detection — is forwarded to the intelligence cloud. The raw frame is discarded.
What leaves the building is not video. It is a structured JSON payload containing counts and events. That payload contains no pixels, no biometric data, no information from which any individual could be identified. It is not personal data under any current or forthcoming framework.
This is not a privacy feature. It is the architecture. There is no configuration, no operator override, and no data pipeline path that would cause raw video to be transmitted. The video path and the intelligence path are physically separated — the same reason a fire door works even when you would rather leave it open.
Performance and bandwidth
The secondary argument against cloud inference is bandwidth. A 1080p camera stream at standard compression consumes approximately 2–4 Mbps. A site with 100 cameras generates 200–400 Mbps of outbound traffic. At MENA enterprise bandwidth prices, that is a significant recurring cost. It also creates a latency dependency — cloud inference results are only as fast as the round-trip to the cloud, which means real-time alerting requires either low latency cloud connectivity or local buffering.
Edge inference eliminates both problems. The edge appliance processes frames locally with sub-100ms latency. The outbound data rate for the intelligence payload — counts and events at five-minute aggregation — is measured in kilobits, not megabits. A site with 100 cameras generates less outbound traffic from its Canopy intelligence layer than a single video call.
The model quality argument
The final argument for cloud inference is model quality — that cloud providers have access to more compute and therefore better models. This argument conflates inference hardware with model architecture. Canopy's models are trained on cloud-scale compute and deployed to edge hardware. The inference hardware on the edge appliance is sized to run those models at full resolution with full frame rates — it is not a compromised deployment. The quality of the inference output is the same whether the model runs on-site or in the cloud. The difference is where the data travels in the process of producing that output. That is the only difference that matters for privacy, compliance, and security.