Visual computing algorithms recognize, manipulate, and synthesize massive visual data such as images, videos, and 3D graphics. These algorithms enable new use-cases on emerging mobile/embedded devices such as AR/VR headsets, robotics, and energy harvesting systems.

What makes visual computing exciting is that effective solutions not only span different computing layers, but also require us to venture beyond computing into other related fields such as optics and acoustics.

Algorithm-SoC Co-Design for Energy-Efficient Continuous Vision

Delivering real-time continuous vision in an energy-efficient manner is a tall order for mobile system design. To overcome the energy-efficiency barrier, today’s most prevalent efforts have largely focused on optimizing hardware (micro-)architecture for the vision kernels of interest, with little regard to the widersystem. Fundamentally, these approaches treat frames in real-time video streams as independent entities, and optimize the execution efficiency of each frame.

In contrast, our work takes a step back and re-examines the inherent execution model of continuous vision at the system-level. In particular, we harness a key trait of continuous vision: visual information changes only incrementally across frames in a real-time video stream. Therefore, the vision results can be incrementally computed from one frame to another without having to re-execute the entire vision algorithm on each frame. We propose a new algorithm that encodes frame pixel changes as object motion, and leverages the motion data to simplify the vision computation for the majority of real-time frames


Along with the new algorithm is the co-designed Systemson-Chip (SoC) architecture. To maximize the efficiency of the incremental vision computation, the new SoC architecture exploits the algorithmic synergies between different vision SoC components. Specifically, we observe that the pixel motion information is naturally generated by the Image Signal Processor (ISP) for temporal de-noising, which is performed early in the vision pipeline. We propose lightweight SoC augmentations that enable reuses of the motion data between the ISP and the vision algorithm with little compute overhead.

This work embodies two key themes of optimizing mobile continuous vision.

Semantic-Aware Virtual Reality Video Streaming

Virtual reality (VR) technologies have huge potential to enable radically new applications, among which spherical panoramic (a.k.a., 360°) video streaming is on the verge of hitting the critical mass. Current VR systems treat 360° VR content as plain RGB pixels, similar to conventional planar frames, resulting in significant waste in data transfer and client-side processing.

Our position paper makes the case that next-generation VR platforms can take advantage of semantics information inherent to VR content so as to improve the streaming and processing efficiency. To that end, we present SVR, a semantic-aware VR system that utilizes the object information in VR frames for content indexing and streaming. SVR exploits the key observation that end-users’ viewing behaviors tend to be object-oriented. Instead of streaming entire frames, SVR delivers miniature frames that cover only the tracked visual objects in VR videos. We implement SVR prototype with a real hardware board and demonstrate that it achieves significant device power saving and network bandwidth reduction.

In the long run, 360° video is just one form of the myriad of visual contents that are being generated and consumed. Computer systems researchers should fundamentally rethink how visual data is organized, managed, and processed. Distilling semantics information from visual data is a particular promising approach. Future developments should examine other forms of visual semantics and look beyond optimizing VR content streaming, but also processing, display, etc. We hope our work serves the first step in a promising new direction of research.