Ben Ochoa

Chief Technologist, Computer Vision
Integrity Applications Incorporated (IAI)

Lecturer
Department of Computer Science and Engineering
University of California, San Diego

Phone: +1 (858) 876-4164
Email: bochoa at ucsd.edu

I perform research and development in the areas of computer vision, photogrammetry, and robotics. I am particularly interested in multiple view geometry and its applications to video.

Brief Bio Curriculum Vitae

Teaching

CSE 252A: Computer Vision I, Fall 2014
CSE 252B: Computer Vision II, Winter 2014

Research and Development Projects

Video Camera Geopositioning

This work addresses the problem of accurately determining the geodetic position (geoposition) and orientation of multiple ground-level video cameras from measurements obtained from a combination of different sensors. The system utilizes global positioning system (GPS) receivers, relative orientation (compass and tilt/roll) sensors, and the video sensors themselves to sequentially estimate the pose of every camera over time. Estimates of the geoposition and orientation of the cameras are continuously provided, though measurements from each of the sensors are sampled asynchronously (e.g., GPS, 1 Hz; video, 30 Hz). Existing approaches to sequential pose estimation, regardless of the types of sensors used, only deal with a single camera or independently estimate of pose of each camera. However, in this work, if multiple cameras are imaging the same region of a scene, these independent observations of the features contained in the scene will be used to further refine the position and orientation of the cameras. The system determines the set of feature correspondences between the images acquired from multiple cameras and uses the set of correspondences to then simultaneously estimate the pose of all of the cameras, correcting for any relative errors between the cameras. This additional correction reduces the uncertainty of the pose estimates, providing more precise and more accurate estimates of both the geoposition and orientation of the cameras.

Video Georegistration

Video georegistration is the process of registering video to collateral data and adjusting the camera parameters (geodetic position, orientation, focal length, etc.) associated with the video such that some cost function is minimized. Of particular interest is data that consists of video and metadata, including measurements of the camera parameters over time. In the case of video acquired from airborne platforms, the collateral data typically includes one or more reference images with known camera parameters and possibly a model of the 3D structure of the scene (e.g., a digital elevation model (DEM)). Central to aerial video georegistration is the establishment of correspondences between the video frames and the 3D scene. Current approaches to aerial video georegistration employ view synthesis (from DEM + reference image) as an initial step in establishing these correspondences. However, these approaches fail when the provided camera parameter measurements are highly uncertain. Work has been done towards an approach to video georegistration that mitigates these issues and does not involve view synthesis. Further, this approach does not require a model of the 3D scene structure, but uses one if it is available, and continues to estimate the camera parameters during metadata dropouts and when correspondences cannot be established between the video and reference image.

Guided Matching

This is a fast, general approach and analytical method for determining a search region for use in guided matching under projective mappings. This approach is based on the propagation of covariance through a first-order approximation of the error model to define the boundary of the search region for a specified probability. Central to this is an analytical expression for the Jacobian matrix used in the covariance propagation calculation. The resulting closed-form expression is easy to implement and generalizes to n dimensions.

B. Ochoa and S. Belongie, Covariance Propagation for Guided Matching, SMVP 2006, Graz, Austria. [pdf]

Mosaic Construction from Video

This is a well studied problem. There are two general approaches to mosaic construction from video: direct methods and feature based methods. Direct methods estimate the motion between successive video frames from the spatial and temporal derivatives of the frames. For linear mappings (e.g., translation, affine), direct methods are straightforward to implement and the motion is usually estimated in a hierarchical, coarse-to-fine manner. Motion estimation using feature based methods is more involved as it requires the detection and tracking of features between successive frames. The feature tracks serve as correspondences between the two frames. A planar transformation is then estimated from the set of correspondences. A pyramidal implementation of Lucas-Kanade is used for feature tracking and RANSAC is applied to reject outliers.

Sample videos illustrating sequential mosaic construction:

UAV

Helicopter

Ground-level (before correction)

Video Stabilization

Not all video cameras contain an optical or mechanical image stabilization mechanism. However, video acquired from such a camera can still be stabilized using digital image stabilization methods. Similar to mosaic construction from video, inter-frame motion is estimated between successive video frames. Next, the motion is decomposed to rotation, translation, scale, and skew parameters over a time window about the current frame. A low-pass filter is applied to each motion parameter relative to the current frame and the frame is stabilized using the filtered parameters.

Videos of sample results:

Hand-held camera, running

MPEG-2 TS KLV Decoder

NGA Motion Imagery Standards Board compliant metadata associated with unmanned aerial vehicle (UAV) video is contained in an MPEG-2 transport stream (TS) with Key-Length-Value (KLV) encoded metadata. In order to use this metadata, the Program Specific Information (PSI) tables are parsed to determine the elementary PID of the packets that carry the KLV encoded metadata. The developed software also parses the Packetized Elementary Stream (PES) packets and decodes the KLV encoded metadata.

Color

Most video standards (e.g., Recommendation ITU-R BT.709, Recommendation ITU-R BT.601, SMPTE standard 240M) specify the set of chromaticities (red, green, blue, and white) of the color space. Transcoding video between standards with different color space chromaticities requires calculation and application of a transformation matrix that maps linear RGB values in the source video to linear RGB values in destination video. The XYZ color space is used as an intermediate color space during this process. This process is similar for image standards (e.g., sRGB, Adobe RGB, wide gamut RGB, ProPhoto RGB). Further, for historical reasons related to displaying pixel data, most video and image standards data are directly encoded or easily converted to nonlinear RGB. Nonlinear RGB values must be converted to linear RGB values prior to color space conversion and the resulting mapped linear RGB values must be converted to nonlinear RGB prior to encoding.

Other color applications include mapping the color of a source image to that of a target image (sometimes called color transfer). This is used in visual effects as well as object recognition and image registration.

Sample color mapping (mean and standard deviation) in CIELAB color space:

Source

Target

Source mapped to target

TIN Generation

An interesting approach to triangulated irregular network (TIN) generation from radiometrically corrected images was used in the project. First, point correspondences were established over multiple images and the parameters for all of the cameras and the reconstructed 3D points were simultaneously adjusted such that some cost function was minimized (i.e., bundle adjustment). The TIN was initialized from the adjusted 3D points. Each triangle in the TIN can be thought of as a planar facet of the scene. Further, the projection of each planar facet onto a given image plane can be locally modeled as an affine camera. As such, the mapping between any two images of a planar facet is modeled by a 2D affine transform. Next, an iterative process adjusted the triangle vertices such that the error between the mapped pixel values over pairs of images was minimized for a given planar facet. If the error did not fall below a specified threshold, then the triangle was broken into smaller triangles and vertices of each of these triangles were adjusted in an attempt to reduce the error. This was performed recursively until the error for all triangles was minimized.

Image Registration

This work includes numerous image registration and georegistration problems ranging from registration for model-based fusion to near real-time registration for target mensuration. Most of the developed approaches borrow from both computer vision and photogrammetry, leveraging the strengths of each of these fields of work.

Active Triangulation

The objective of this project was to reconstruct the 3D scene geometry and reflectance of the ocean floor using active triangulation methods. This was accomplished using a high-speed line camera to image 3D points in the scene that were illuminated by a laser. A rotating mirror was used to angularly sweep the laser beam across the plane defined by the camera center and the linear sensor of the camera (i.e., the instantaneous view plane). At each image acquisition, the mirror angle was measured and the image point corresponding to the illuminated scene point determined. The 3D position of the scene point in the instantaneous view plane was then determined by triangulation. Images were acquired at a maximum rate of 9.1 kHz as the laser was swept across the field of view of the camera. The entire system translated in a direction orthogonal to the instantaneous view plane (a linear pushbroom model) to image different regions of the scene.

Videos of sample results:

Sandwave (geometry only)

Turtle grass

Inert mine

K.D. Moore, J.S. Jaffe, and B.L. Ochoa, Development of a New Underwater Bathymetric Laser Imaging System: L-Bath, Journal of Atmospheric and Oceanic Technology, 17(8):1106-1117, August 2000.

Flow Imaging and Classification

This project involved developing a real-time imaging and classification system for use with the Continuous Underway Fish Egg Sampler (CUFES). Rather than have a marine biologist count the fish eggs, the developed system performed this task. A high-speed line camera imaged a flow tube with collimated backlighting as sea water containing fish eggs and other similar sized objects flowed through the tube. The system sequentially estimated the background image and the variance of each pixel in the background image. Each image was segmented using the background model and features were extracted from the segmented objects. Prior to deployment, a classifier was trained using samples classified by a marine biologist. The resulting model was used during deployment to classify the objects.

J.R. Powell, S. Krotosky, B. Ochoa, D. Checkley, and P. Cosman, Detection and Identification of Sardine Eggs at Sea Using a Machine Vision System, MTS/IEEE OCEANS 2003.

Software

Over the years I have developed a C/C++ library (documentation) for computer vision and photogrammetry. Contact me directly if you are interested in licensing this library.

Software libraries that I use in my work:

Intel Integrated Performance Primitives (IPP): signal processing, image and video processing, and cryptography.
Intel Math Kernel Library (MKL): BLAS, sparse BLAS, LAPACK, ScaLAPACK, PBLAS, sparse solvers, extended eigensolvers, vector mathematical operations, statistical operations, Fourier transforms, partial differential equations solvers, and nonlinear optimizers.
ImageMagick: convert, edit, and compose images. I use this for image file I/O (ImageMagick supports over 100 image file formats).
OpenEXR: a high dynamic range (HDR) image file format developed by Industrial Light & Magic.
FFmpeg: record, convert, and stream audio and video. It includes libavcodec, a library of audio and video encoders and decoders, and libavformat, a library of parsers and generators for audio and video formats. I use these libraries for audio and video file I/O.
NGA Geographic Translator (GEOTRANS): convert geographic coordinates among a wide variety of coordinate systems, map projections, and datums.
libexif: read and write exchangeable image file format (Exif) tags from and to image files.

Software libraries that I do not use, but occasionally monitor:

VXL: a collection of C++ libraries for computer vision research and implementation.
Intel Open Source Computer Vision Library (OpenCV): a collection of algorithms and sample code for various computer vision problems.