Session 11: Getting the Car to Drive by Itself! ππ¨ο
In this session, we are going to use everything weβve learned so far to make the car drive by itself! Weβll use the trained brain (model) to see the road and steer the car in the right direction.
Letβs break it down and make it super easy to understand. π
β
Whatβs Happening in the Code? π§βπ»ο
In this code, weβre going to:
Use a Camera to See the Road: The car will have a camera that sends pictures to the computer.
Make Decisions with the Computer Brain: The brain we built earlier will look at these pictures and decide how to steer.
Drive the Car: Based on what the brain sees, the car will turn the wheel and drive!
Letβs take a closer look at the important parts of the code. π
β
1. Getting the Picture from the Camera πΈο
The car has a camera that takes pictures of the road. These pictures are sent to the computer so the brain can see them.
Hereβs how we get the picture ready: - We change the picture into a special format the brain understands (using cv2.cvtColor). - We blur the picture a little bit so that the brain doesnβt get confused by tiny details. - We resize the picture to make it smaller and easier for the brain to handle.
β
2. Making the Brain Workο
Once we have the picture, we send it to the brain to make a decision:
The brain looks at the picture and figures out how much to turn the steering wheel.
It gives us a steering angle that tells us if the car should turn left or right.
β
3. Driving the Car! πο
After the brain decides how much to steer, we send a command to the car:
Move forward with a speed of 0.5 (not too fast, so the car doesnβt crash!).
Turn the steering wheel based on the brainβs decision (the steering angle).
β
How Does the Car Learn to Steer? π§ ο
We train the brain using pictures of roads and how to steer.
Once the brain is trained, it gets better and better at making decisions.
Finally, the brain can drive the car all by itself, using its new knowledge!
β
Putting Everything Togetherο
Hereβs the code that makes it all happen:
import torch
import torch.nn as nn
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from cv_bridge import CvBridge
from geometry_msgs.msg import Twist
import cv2
import numpy as np
class NvidiaModel(nn.Module):
def __init__(self):
super(NvidiaModel, self).__init__()
# The brain (model) has special eyes to look at the road
self.conv1 = nn.Conv2d(3, 24, kernel_size=5, stride=2)
self.conv2 = nn.Conv2d(24, 36, kernel_size=5, stride=2)
self.conv3 = nn.Conv2d(36, 48, kernel_size=5, stride=2)
self.conv4 = nn.Conv2d(48, 64, kernel_size=3)
self.conv5 = nn.Conv2d(64, 64, kernel_size=3)
self.fc1 = nn.Linear(64 * 1 * 18, 100)
self.fc2 = nn.Linear(100, 50)
self.fc3 = nn.Linear(50, 10)
self.fc4 = nn.Linear(10, 1)
def forward(self, x):
# The brain processes the image
x = nn.functional.elu(self.conv1(x))
x = nn.functional.elu(self.conv2(x))
x = nn.functional.elu(self.conv3(x))
x = nn.functional.elu(self.conv4(x))
x = nn.functional.elu(self.conv5(x))
x = x.reshape(-1, 64 * 1 * 18) # Reshape image data for decision-making
x = nn.functional.elu(self.fc1(x))
x = nn.functional.elu(self.fc2(x))
x = nn.functional.elu(self.fc3(x))
x = self.fc4(x) # The brain's decision (steering angle)
return x
def img_preprocess(img):
# Process the image so the brain can understand it
img = cv2.cvtColor(img, cv2.COLOR_BGR2YUV)
img = cv2.GaussianBlur(img, (3, 3), 0)
img = cv2.resize(img, (200, 66)) # Resize the image
img = img / 255 # Make the image values smaller (between 0 and 1)
return img
def camera_cb(msg):
global twist_pub
# Get the picture from the camera
image = CvBridge().imgmsg_to_cv2(msg, desired_encoding="bgr8").copy()
image = img_preprocess(image) # Process the picture
# Convert the picture into a format the brain can understand
inputs = np.transpose(np.asarray([image]), (0, 3, 1, 2))
inputs_tensor = torch.tensor(inputs, dtype=torch.float32)
inputs_tensor = inputs_tensor.to(device)
# Get the steering decision from the brain
outputs = prediction_model(inputs_tensor)
steer_angle = outputs.cpu().detach().numpy()[0][0]
# Send the command to drive the car
twist_msg = Twist()
twist_msg.linear.x = 0.5 # Move forward
twist_msg.angular.z = float(steer_angle) # Turn the wheel based on brain's decision
twist_pub.publish(twist_msg)
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Create the brain and load the trained knowledge
prediction_model = NvidiaModel().to(device)
checkpoint_path = 'model.pth'
checkpoint = torch.load(checkpoint_path, weights_only=True)
prediction_model.load_state_dict(checkpoint)
prediction_model.eval() # Set the brain to test mode
# Initialize ROS2 and subscribe to the camera
rclpy.init()
node = Node("data_collector")
node.create_subscription(Image, "/stk_image", camera_cb, 1)
twist_pub = node.create_publisher(Twist, "/cmd_vel", 1)
# Start the car's journey!
rclpy.spin(node)
rclpy.shutdown()
β
What Did We Do in This Code? π€ο
Load the brain: We loaded the trained model (brain) to make decisions.
Get pictures: The camera gives pictures of the road.
Process the picture: The picture is changed into a form the brain can understand.
Make a decision: The brain tells the car how much to steer.
Drive the car: The car turns the wheel and moves based on the brainβs decision.
β
The above code uses PyTorch for inference, which works very slow if not running on a GPU. To run the code on a CPU, you can use the following code, which uses ncnn library for inference:
import ncnn
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from cv_bridge import CvBridge
from geometry_msgs.msg import Twist
import cv2
class NvidiaModelNCNN:
def __init__(self, param_path, bin_path):
self.net = ncnn.Net()
self.net.load_param(param_path)
self.net.load_model(bin_path)
self.mean_vals = [0.0, 0.0, 0.0]
self.norm_vals = [1.0 / 255.0, 1.0 / 255.0, 1.0 / 255.0]
def predict(self, img):
img = cv2.cvtColor(img, cv2.COLOR_BGR2YUV)
mat_in = ncnn.Mat.from_pixels_resize(
img, ncnn.Mat.PixelType.PIXEL_RGB, img.shape[1], img.shape[0], 200, 66
)
mat_in.substract_mean_normalize(self.mean_vals, self.norm_vals)
# Inference
ex = self.net.create_extractor()
ex.input("in0", mat_in)
ret, out = ex.extract("out0") # Adjust "out0" as needed
if ret != 0:
print("Error in inference:", ret)
return 0.0 # Default steering angle on error
return float(out[0]) # Steering angle
def camera_cb(msg):
global twist_pub
# Get the image from the camera
image = CvBridge().imgmsg_to_cv2(msg, desired_encoding="bgr8").copy()
# Get the steering angle from the ncnn model
steer_angle = ncnn_model.predict(image)
# Send the control command
twist_msg = Twist()
twist_msg.linear.x = 0.5 # Move forward
twist_msg.angular.z = float(steer_angle) # Apply steering angle
twist_pub.publish(twist_msg)
# Load the ncnn model
param_path = "ncnn_model.ncnn.param"
bin_path = "ncnn_model.ncnn.bin"
ncnn_model = NvidiaModelNCNN(param_path, bin_path)
# Initialize ROS2 and subscribe to the camera
rclpy.init()
node = Node("ncnn_data_collector")
node.create_subscription(Image, "/stk_image", camera_cb, 1)
twist_pub = node.create_publisher(Twist, "/cmd_vel", 1)
# Start the ROS2 loop
rclpy.spin(node)
rclpy.shutdown()
β
Summary πο
In this session, we: - Used the trained brain to steer the car. - Made the car drive by looking at the camera and steering itself. - The brain got smarter as we trained it and now can drive the car all by itself!
Have fun racing your improved self-driving car in SuperTuxKart! ππ¨