Session 11: Getting the Car to Drive by Itself! 🚗💨

In this session, we are going to use everything we’ve learned so far to make the car drive by itself! We’ll use the trained brain (model) to see the road and steer the car in the right direction.

Let’s break it down and make it super easy to understand. 🌟

—

What’s Happening in the Code? 🧑‍💻

In this code, we’re going to:

Use a Camera to See the Road: The car will have a camera that sends pictures to the computer.
Make Decisions with the Computer Brain: The brain we built earlier will look at these pictures and decide how to steer.
Drive the Car: Based on what the brain sees, the car will turn the wheel and drive!

Let’s take a closer look at the important parts of the code. 👇

—

1. Getting the Picture from the Camera 📸

The car has a camera that takes pictures of the road. These pictures are sent to the computer so the brain can see them.

Here’s how we get the picture ready: - We change the picture into a special format the brain understands (using cv2.cvtColor). - We blur the picture a little bit so that the brain doesn’t get confused by tiny details. - We resize the picture to make it smaller and easier for the brain to handle.

—

2. Making the Brain Work

Once we have the picture, we send it to the brain to make a decision:

The brain looks at the picture and figures out how much to turn the steering wheel.
It gives us a steering angle that tells us if the car should turn left or right.

—

3. Driving the Car! 🚗

After the brain decides how much to steer, we send a command to the car:

Move forward with a speed of 0.5 (not too fast, so the car doesn’t crash!).
Turn the steering wheel based on the brain’s decision (the steering angle).

—

How Does the Car Learn to Steer? 🧠

We train the brain using pictures of roads and how to steer.
Once the brain is trained, it gets better and better at making decisions.
Finally, the brain can drive the car all by itself, using its new knowledge!

—

Putting Everything Together

Here’s the code that makes it all happen:

import torch
import torch.nn as nn
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from cv_bridge import CvBridge
from geometry_msgs.msg import Twist
import cv2
import numpy as np


class NvidiaModel(nn.Module):
    def __init__(self):
        super(NvidiaModel, self).__init__()
        # The brain (model) has special eyes to look at the road
        self.conv1 = nn.Conv2d(3, 24, kernel_size=5, stride=2)
        self.conv2 = nn.Conv2d(24, 36, kernel_size=5, stride=2)
        self.conv3 = nn.Conv2d(36, 48, kernel_size=5, stride=2)
        self.conv4 = nn.Conv2d(48, 64, kernel_size=3)
        self.conv5 = nn.Conv2d(64, 64, kernel_size=3)
        self.fc1 = nn.Linear(64 * 1 * 18, 100)
        self.fc2 = nn.Linear(100, 50)
        self.fc3 = nn.Linear(50, 10)
        self.fc4 = nn.Linear(10, 1)

    def forward(self, x):
        # The brain processes the image
        x = nn.functional.elu(self.conv1(x))
        x = nn.functional.elu(self.conv2(x))
        x = nn.functional.elu(self.conv3(x))
        x = nn.functional.elu(self.conv4(x))
        x = nn.functional.elu(self.conv5(x))
        x = x.reshape(-1, 64 * 1 * 18)  # Reshape image data for decision-making
        x = nn.functional.elu(self.fc1(x))
        x = nn.functional.elu(self.fc2(x))
        x = nn.functional.elu(self.fc3(x))
        x = self.fc4(x)  # The brain's decision (steering angle)
        return x


def img_preprocess(img):
    # Process the image so the brain can understand it
    img = cv2.cvtColor(img, cv2.COLOR_BGR2YUV)
    img = cv2.GaussianBlur(img, (3, 3), 0)
    img = cv2.resize(img, (200, 66))  # Resize the image
    img = img / 255  # Make the image values smaller (between 0 and 1)
    return img


def camera_cb(msg):
    global twist_pub

    # Get the picture from the camera
    image = CvBridge().imgmsg_to_cv2(msg, desired_encoding="bgr8").copy()
    image = img_preprocess(image)  # Process the picture

    # Convert the picture into a format the brain can understand
    inputs = np.transpose(np.asarray([image]), (0, 3, 1, 2))
    inputs_tensor = torch.tensor(inputs, dtype=torch.float32)
    inputs_tensor = inputs_tensor.to(device)

    # Get the steering decision from the brain
    outputs = prediction_model(inputs_tensor)
    steer_angle = outputs.cpu().detach().numpy()[0][0]

    # Send the command to drive the car
    twist_msg = Twist()
    twist_msg.linear.x = 0.5  # Move forward
    twist_msg.angular.z = float(steer_angle)  # Turn the wheel based on brain's decision
    twist_pub.publish(twist_msg)


device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Create the brain and load the trained knowledge
prediction_model = NvidiaModel().to(device)
checkpoint_path = 'model.pth'
checkpoint = torch.load(checkpoint_path, weights_only=True)
prediction_model.load_state_dict(checkpoint)
prediction_model.eval()  # Set the brain to test mode

# Initialize ROS2 and subscribe to the camera
rclpy.init()
node = Node("data_collector")
node.create_subscription(Image, "/stk_image", camera_cb, 1)
twist_pub = node.create_publisher(Twist, "/cmd_vel", 1)

# Start the car's journey!
rclpy.spin(node)
rclpy.shutdown()

—

What Did We Do in This Code? 🤔

Load the brain: We loaded the trained model (brain) to make decisions.
Get pictures: The camera gives pictures of the road.
Process the picture: The picture is changed into a form the brain can understand.
Make a decision: The brain tells the car how much to steer.
Drive the car: The car turns the wheel and moves based on the brain’s decision.

—

The above code uses PyTorch for inference, which works very slow if not running on a GPU. To run the code on a CPU, you can use the following code, which uses ncnn library for inference:

import ncnn
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from cv_bridge import CvBridge
from geometry_msgs.msg import Twist
import cv2


class NvidiaModelNCNN:
    def __init__(self, param_path, bin_path):
        self.net = ncnn.Net()
        self.net.load_param(param_path)
        self.net.load_model(bin_path)
        self.mean_vals = [0.0, 0.0, 0.0]
        self.norm_vals = [1.0 / 255.0, 1.0 / 255.0, 1.0 / 255.0]

    def predict(self, img):
        img = cv2.cvtColor(img, cv2.COLOR_BGR2YUV)
        mat_in = ncnn.Mat.from_pixels_resize(
            img, ncnn.Mat.PixelType.PIXEL_RGB, img.shape[1], img.shape[0], 200, 66
        )
        mat_in.substract_mean_normalize(self.mean_vals, self.norm_vals)

        # Inference
        ex = self.net.create_extractor()
        ex.input("in0", mat_in)
        ret, out = ex.extract("out0")  # Adjust "out0" as needed

        if ret != 0:
            print("Error in inference:", ret)
            return 0.0  # Default steering angle on error

        return float(out[0])  # Steering angle


def camera_cb(msg):
    global twist_pub

    # Get the image from the camera
    image = CvBridge().imgmsg_to_cv2(msg, desired_encoding="bgr8").copy()

    # Get the steering angle from the ncnn model
    steer_angle = ncnn_model.predict(image)

    # Send the control command
    twist_msg = Twist()
    twist_msg.linear.x = 0.5  # Move forward
    twist_msg.angular.z = float(steer_angle)  # Apply steering angle
    twist_pub.publish(twist_msg)


# Load the ncnn model
param_path = "ncnn_model.ncnn.param"
bin_path = "ncnn_model.ncnn.bin"
ncnn_model = NvidiaModelNCNN(param_path, bin_path)

# Initialize ROS2 and subscribe to the camera
rclpy.init()
node = Node("ncnn_data_collector")
node.create_subscription(Image, "/stk_image", camera_cb, 1)
twist_pub = node.create_publisher(Twist, "/cmd_vel", 1)

# Start the ROS2 loop
rclpy.spin(node)
rclpy.shutdown()

—

Summary 🏁

In this session, we: - Used the trained brain to steer the car. - Made the car drive by looking at the camera and steering itself. - The brain got smarter as we trained it and now can drive the car all by itself!

Have fun racing your improved self-driving car in SuperTuxKart! 🚗💨