Self Driving Simulation (Computer Vision)

Introduction

This project is about training machine learning (computer vision) models, in compliance to one of my postgraduate courses “Intelligent Systems”. I worked on this project with my other two group members named William Stockley and Ricky Tsang.

This project relies on image data gathered from Udacity’s self-driving car simulator in training mode, where the gear angle and speed are referenced in a CSV file. We also used this article written by Naoki for further reference. The aim of this project is to effectively allow the self-driving car to learn how to drive a lap around a track without crashing or leaving the road. There are mainly two types of models used in this project, one uses classification and the other uses regression. As this was a group project, I mainly worked on the regression type with Simple Regression Network and Convolutional Neural Network (CNN). I still have most of the code we used in my Google Colab here, it will no longer run however due to the disk mount needed that contains the relevant data files.

Computer vision is an area in machine learning where there’s constant improvement in the performance of vision tasks such as image detection. It supports automation which arguably can assist to lower cost of energy and resources as it can find the best route and apply the right amount of pressure to breaks. Furthermore, it can help eliminate risks due to human error that may come from several reasons such as fatigue, distraction, or absence of mind. The use of computer vision for self-driving has the potential to create a safer road condition for everyone.

As with any use of machine learning and artificial intelligence, there are ethical considerations to be made. This includes:

whether it is appropriate to replace human judgement and outsource to machine vision technology.
potential for fatal error if the model is not accurate and or well trained, or if sensors malfunction.
lack of universal consensus on what is a moral decision the machine should be making on the road in certain contexts or situations.
question of informed consent of humans whose likeness are processed by the machine vision.
how the sensitive user data be handled and safeguarded.
the legal and moral responsibility in cases of machine vision failure.
the uncertain impact and unintended consequences of the technology.

Data Set

Classification Model: The data for this model are images provided by the course in grey scale and additional images generated by the driving simulator were taken, provided the screen recording feature and ability to take frames from it. The pictures were grouped into two where one shows what the scene would look like if the car should turn right and the other if the car should turn left.

Regression Model: The data for this model are only the images generated by the driving simulator, taken during one lap round in training mode. These images are from frontal cameras of the car and a corresponding CSV file references all the images as well as the respective steering angle and the speed for each image.

Libraries Used

csv – used for reading and writing of CSV files.
cv2 – used for loading, displaying and saving images.
glob – used for matching files with specific patterns.
keras – runs on top of TensorFlow for development of model architectures.
matplotlib – used for creating static, animated, and interactive visualizations.
numpy – used for working with arrays.
os – used for interaction with the operating system.
pandas – offers data structures and operations for manipulating numerical tables and time series.
PIL – for interpretation with image editing.
sklearn – for classification, regression, clustering and dimensionality reduction.
tensorflow – for machine learning and artificial intelligence.
re – used for regular expressions.

For Classification Model

For Regression Model

Classification Model: Data Cleaning & Pre-Processing

This part of the code is for data cleaning and pre-processing of the images, it runs prior to feeding it to the classification model for training. Basically, it:

Resizes the images to 224 x 224
Converts the grey scale (2D) to three channels (3D).
Converts BGR to RGB.
Normalise image pixels.

Classification Model: Prepared Data

We prepared 101 “Turn Left” images and 93 “Turn Right” images.

Classification Model: Training

Parameters

Pre-trained Neural Networks	VGG16
Additional Layers	Dense, Dropout, Dense, Dropout, Dense
Weights	Imagenet
Trainable	Last 10 layers
Batch Size	16
Epochs	25
Optimiser	RMSprop
Learning Rate	0.00001
Momentum	0.2

After training the model, there will be classification of whether the car should turn left or right (values between 0 to 1).

Classification Model: Accuracy & Loss

Loss (binary cross-entropy) = 0.36%
Accuracy = 100%

This part of the code is for plotting the accuracy and the loss of the model for visualisation purposes.

Classification Model: Driving Logic

•”Left” > 0.5: Steering angle = -0.5 (-12.5 degree)
•”Right” > 0.5: Steering angle = 0.5 (12.5 degree)
•”Left” <= 0.5 & “Right” <= 0.5: Steering angle = different “Left” & “Right”

Classification Model: Results

Regression Model: Data Cleaning & Pre-Processing

This part of the code is for data cleaning and pre-processing of the images, it runs prior to feeding it to the regression model for training. Basically, it:

Identifies the image path and reading of image.
Converts the image from BGR to RGB.
Obtains the images with corresponding measurements (steering angle) from the normal array.
Adds more data by simply flipping the images and add its corresponding negated measurement.
Assigns images and measurements to numpy array for X_train and y_train for keras to read.

Regression Model (Simple Regression Network): Training

This uses sequential model. One flattened layer is added where the input shape is the dimension of the images used to train (160×320), and it has 3 channels RGB.

To compile the model, “mean square error” is used for loss and “adam” for optimizer. The “accuracy” is obtained from metrics.

Finally for training, the variable history is used and data is collected to be able to plot the graph later. The model is trained with x_train and y_train as the input data, which are the images and measurements, with a validation_split of 0.2, which means 20% for validation data is used and the other 80% for training. The data is then shuffed and 10 epochs are used.

Dense	1 output neuron
Loss	means square error
Optimizer	adam
Validation Data	20%
Epochs	10

Regression Model (Simple Regression Network): Accuracy & Loss

The following code is for plotting the accuracy and loss of this model, and the graphs show the accuracy and loss of the train and test datasets over epochs.

The accuracy of this model for train and test data sets appears inconsistent and shifts between 0 and 1. As for the loss, the train data sets was initially 0 while the test data sets were initially high and they appear to synchronize.

It was found that this model is not enough as when it was used with the simulator to run the car autonomously, the car went off the track almost all of the time. Therefore, it is not sufficient for one node in a network to properly predict the steering angle.

The group continued to implement CNN (Convolutional Neural Network). This is a type of Deep Learning algorithm that can take an input image, assign importance (learning weights and biases) to various aspects/objects in the image, and be able to differentiate one from the other. It transforms an input image in order to extract features from it (Saha, 2018).

Regression Model (CNN): Training

In implementing CNN, sequential model is again used and the images are cropped this time. This removes 70 pixels from the top and 25 pixels from the bottom so that we would only be left with the road or track. A custom layer is then added to normalize the pixels.

After that is where the convolution happens. For example on the 5th line where there is “model.add(Conv2D)”, the model learned 24 filters using a 5×5 kernel, rectified linear unit for activation, and same padding. Max pooling is then used to select the maximum value from a feature map and this process is repeated with some changes in parameter value.

Regression Model (CNN): Visualisation of Model

To show the layer order and output shape of each layer.

Cropping
Normalization
Dropout
Conv2D
Pooling
Flatten
Dense

Regression Model (CNN): Accuracy & Loss

The following graphs are the accuracy and loss of the regression model with CNN. For the accuracy, it shows that the train and test data sets over the training epochs became high (100%) and for the loss, they became low (0) – these are the desirable results.

Regression Model (CNN): Results

Conclusion

The training data collected combined with functions in the Python code produced two machine learning models (classification and regression) capable of driving autonomously around a lap.
The regression model outperformed the classification model.
This provides compelling insights into the capabilities of computer vision for autonomous driving.