Introduction to Scale-Invariant Feature Transform (SIFT)

Scale-Invariant Feature Transform Summary

SIFT Detection Paper: http://www.cs.ubc.ca/~lowe/papers/iccv99.pdf

Terms used in SIFT paper

LoG = Laplacian of Gaussian

DoG = Difference of Gaussian

BBF = Best-Bin-First

RANSAC = RANdom SAmple Consensus

NN = Nearest Neighbor

IT = Inferior Temporal

What Can Developers Use the SIFT algorithms for?

• Locate a certain object in an image of many other objects
• Locate an object between frames in a sequence of images (video)
• Stitching together images to create a panoramic image
• Robot localization and mapping
• 3D scene modeling, recognition and tracking
• 3D SIFT-like descriptors for human action recognition
• Analyzing the Human Brain in 3D Magnetic Resonance Images

Resources

http://docs.opencv.org/3.1.0/da/df5/tutorial_py_sift_intro.html

http://www.vlfeat.org/api/sift.html

https://en.wikipedia.org/wiki/Scale-invariant_feature_transform

Types of Problems Machine Learning Can Solve

What types of Problems can Machine Learning solve?

• Problems the human brain does easily, but we aren’t sure how they are accomplished. e.g. 3D object recognition
• Problems without simple and reliable rules.  The answer might be a combination of a large number of weak rules. e.g. detecting credit card fraud
• Moving targets where programs need to change because methods change in the real world.  e.g. again credit card fraud

Since we don’t know the exact methods by which to code these types of examples, the machine learning approach is to provide a generic algorithm a bunch of correct examples and let the machine figure out how to get there.

Done well, this approach should work well for new data fed into the program, as well as the examples given to it initially while training.

Also, if the data changes over time (as in the credit card fraud detection problem) the program can change by being trained on new data.

This boils down to:

• Recognizing patterns
• Recognizing anomalies
• Making predictions

Credit

Review of Tesla’s Short Self-driving Proof of Concept

The views they provide are (left to right, top to bottom):

1. Interior cabin & through windshield
2. The vehicle’s left rearward vehicle camera
3. The vehicle’s medium range (forward) camera
4. The vehicle’s right reward vehicle camera

Related: Tesla’s HW2 (Hardware 2) sensor suite

Object detection:

• Motion flow
• Lane lines (left of vehicle)
• Lane lines (right of vehicle)
• In-path objects
• Objects

The following are my observations.  These are not necessarily errors or incorrect, but are things worth mentioning.

My general observations:

• On two lane roads, far left line typically not detected on Medium range camera
• Multiple bounding boxes around the same object
• Rearward objects labeled as “in-path”
• Brake pedal moves, but accelerator pedal does not appear to move
• Slowed down for a crosswalk (0:12)
• Detected pedestrian near road, but did not consider to be in-path (0:41)
• Stopped/slowed down for walkers/joggers near road (0:55)
• Stopped during right turn (1:02)
• Detects the back of road signs as signs (1:23)
• Stopped after right turn (1:33)
• The cameras angles not included are: forward narrow, forward wide, left and right side forward facing, and rear facing
• From the information provided here, we cannot determine whether pedestrians are treated as any other object or separately as a “pedestrian type” object

Errors

Tesla’s Hardware 2 Sensor Suite

Tesla revealed their current production vehicles are equipped with the necessary hardware for autonomous driving capabilities.

They mention:

• 8 surround cameras providing 360 degree visibility around the car
• 12 ultrasonic sensors
• 1 on-board computer

Cameras

1. Wide forward camera
2. Main forward camera
3. Narrow forward camera
4. Driver’s side rearward looking side camera
5. Passenger’s side rearward looking side camera
6. Driver’s side forward looking side camera
7. Passenger’s side forward looking side camera
8. Rear View camera

Camera type, location, FOV, distance and use:

 Type Mounting Location Field of View Max Distance Function 1 Wide forward camera Interior: Triple camera, above rear-view mirror 120 degree fish-eye 60m (~197 ft.) Captures traffic lights, obstacles cutting into the path of travel and objects at close range. Particularly useful in urban, low speed maneuvering 2 Main forward camera Interior: Triple camera, above rear-view mirror Unknown, assumed ~50 degree 150m (~492 ft.) Covers a broad spectrum of use cases 3 Narrow forward camera Interior: Triple camera, above rear-view mirror Unknown, assumed ~25 degree 250m (~820 ft.) Provides a focused, long-range view of distant features. Useful in high-speed operation 4 Driver’s side rearward looking side camera Exterior: fender badge mounted Unknown 100m (~328 ft.) Monitor rear blind spots important for safely changing lanes and merging into traffic 5 Passenger’s side rearward looking side camera Exterior: fender badge mounted Unknown 100m (~328 ft.) Monitor rear blind spots important for safely changing lanes and merging into traffic 6 Driver’s side forward looking side camera Exterior: B pillar mounted 90 degree 80m (~262 ft.) Detect cars unexpectedly entering your lane and additional safety when entering intersections with limited visibility 7 Passenger’s side forward looking side camera Exterior: B pillar mounted 90 degree 80m (~262 ft.) Detect cars unexpectedly entering your lane and additional safety when entering intersections with limited visibility 8 Rear View camera Exterior: Hatch latch area mounted Unknown 50m (~164 ft.) Backing up safely, contributing member of the Autopilot hardware suite, useful when performing complex parking maneuvers

Note: locations correspond to Model X locations

The forward looking radar’s range falls between the Narrow and Main camera’s distance with a max range of 160m (~525ft).

Ultrasound

These are the shortest range sensors, with a max distance of 8m (~26ft)

Computer

NVIDIA Drive PX 2  with more than 40 times the computing power of the previous generation unit in Tesla vehicles

Sources:

https://www.tesla.com/autopilot

Tesla Motors’ Self-Driving Car “Supercomputer” Powered by NVIDIA DRIVE PX 2 Technology

By default, Google Groups are closed and require membership to email the group.  There are cases where this isn’t ideal, such as:

• using the email for social media accounts
• creating a department email that people from other organizations can email
• etc.

If you are using the group to receive public emails these settings.

Permissions > Basic permissions

• Post: Public

Settings > Moderation

• Spam messages: Skip the moderation queue and post to the group

The first setting is required and the second optional, but random legitimate messages tend to get caught in the Spam filters.

TensorFlow and deep learning, without a PhD

Excellent introductory video from Google’s Martin Gorner covering the handwritten 0-9 digits recognition problem using the MNIST data set.  Code is Python using the Tensorflow library.

He begins with a single layer network, progresses into a multi-layer network and ends with a convolutional neural network, showing how improvements in techniques correspond to better test accuracy.

Covers softmax, sigmoid, ReLU, matrix transformation, convolutional networks, over-fitting, test and training accuracy.

Well worth a watch for beginners and has better explanations than some full courses online.

Note: they figure out the microphone feedback situation around minute 15 or 16.

Flip Image OpenCV Python

OpenCV provides the flip() function which allows for flipping an image or video frame horizontally, vertically, or both.

Reference:

OpenCV documentation:  http://docs.opencv.org/2.4/modules/core/doc/operations_on_arrays.html#flip

Exploring Udacity’s 1st 40GB driving data set

I read about the release of their second data set yesterday and wanted to check it out.  For convenience, I downloaded the original, smaller, data set.

Preface: ROS is only officially supported on Ubuntu & Debian and is experimental on  OS X (Homebrew), Gentoo, and OpenEmbedded/Yocto.

Getting the data

The data set, which is linked to from the page above, was served up from Amazon S3 and actually seemed quite slow to download, so I let it run late last night and started exploring today.

and after extracting is a 42.3 GB file dataset.bag

.bag is a file type associated with the Robot Operating System (ROS)

Data overview

To get an overview of the file use the rosbag info <filename> command:

result

Open in new window

There are 28 data topics from on-board sensors including 3 color cameras.  Topics:

• /center_camera/image_color
• /left_camera/image_color
• /right_camera/image_colors

Each camera topic has 15212 messages.   Doing the math on 15212 messages / 760 seconds works out to roughly 20 frames per second.

Viewing the video streams

Converting a camera topic to a standalone video is a two step process:

1. export jpegs from the bag file
2. convert the jpegs to video

Exporting the jpegs

To export the image topic to jpegs, the bag needs to be played back and the frames extracted.  This can be done with a launch script.  The default filename pattern (frame%04d.jpg) allows for 4 numerical figures, so we need to add the following line to modify the default file name pattern into one that allows for 5 digits:

The entire script below that launches the player and extractor:

The number of resulting frames should match the number of topic messages seen from info.

If not, as was our case, the default sec per frame time should be changed.  It seems counter-intuitive, but after slowing down the rate, trying “0.11” and “0.2”, the number of frames extracted was also going down.  I settled on “0.02” seconds per frame which resulted in the correct number of frames.  Add the line to the launch script.

The working launch script now looks like this:

Download working Left, Center, and Right jpeg export launch scripts on GitHub

The result should be the correct number of frames saved (frames starts at 00000) and the message “[rosbag-1] process has finished cleanly”

Hit Ctrl + C to exit

frame00000.jpg 640×480

Convert the jpegs to video

Resources:

License: The data referenced in this post is available under the MIT license.  This post is available under CC BY 3.0

Udacity open sources 223GB of driving data

Following on the heels of another self-driving car developer, comma.ai releasing driving data, Udacity open sourced two data sets from their self-driving Lincoln MKZ.

Udacity’s data is over 70 minutes of driving spread over two days from Mountain View, Calif.   You can read more from the TechCrunch article or their Medium post.

The data is available under the MIT License.

We downloaded the first, smaller data set and started exploring the data.

We also have a page tracking available data sets.

License: this post is available under CC BY 3.0