Object Detection is the process of identifying an object in an image or video stream with its precise location. It is one of the most popular technique aiding us in solving many real world problems in computer vision. There are many state-of-the-art architectures that are used for object detection, like YOLO, RCNNs. FasterRCNN, DETR etc. YOLO has released the latest version yolov11 on September 2024 which seems to be more faster and precise than previous versions. Let’s dive deep on how we can train it on our own custom dataset.

What is YOLO:

YOLO (You only look once) is one of the best object detection architecture widely known for its real time high speed, accuracy and its simple implementation. It is a single shot object detection which passes and processes the image just single time and predicts the localized object in an image.

Implementation with our custom dataset:

You can access and run the code directly in Google Colab by clicking the link below: https://colab.research.google.com/drive/1NGiYpiPCspERy4WcXor0SrXuClrWtlXD?usp=sharing

Or can clone repo via github as well: https://github.com/dibyaadhikari/YOLOv11-implementation

Dataset:

For this demonstration, we are using the RTVTR (Nepali Number plate) dataset for detecting the characters on a number plate.

Link to the dataset is added below:

https://www.kaggle.com/datasets/inspiring-lab/nepali-vehicles-number-plate-dataset

The above dataset is already labeled and is ready to be used.

Split the dataset.

We need to split the dataset into train and test. Ideally for a smaller dataset, 80%-20% split is the best approach.If the datasets are larger than we can use 90%-10% split as well.

Create a custom .yaml file

After splitting the dataset, let’s create a custom.yaml file for adding the dataset path and the objects that need to be detected.For our task we are detecting “character” from the vehicle number plate. Add the below lines on the custom.yaml file.

train: path/to/train

val: path/to/val

nc: 1

names: [character]

Here,

train takes the path to the train datasets

test takes the path to the val datasets

nc is the number of classes that we are going to detect.

And names are the names of the classes.

Installing the dependencies

  • pip install ultralytics

Training the model

Lets create a python script and add the following code:

from ultralytics import YOLO

#Create a new YOLO model from scratch

model = YOLO(“yolo11n.yaml”)

#Train the model with our custom dataset for 50 epochs

results = model.train(data= “path/to/custom.yaml”, epochs=50, batch=64, imgsz=256)

Here,

 data : takes the path of the custom.yaml file which we created earlier.

Epochs: Number of iteration we are feeding our whole custom dataset into the model.

Batch: Number of images it takes at one time to feed to the model before updating the model parameter.

Size: sets the size of the image which it will be trained on.

This will start the training process, time taken for the training process depends on the GPU we are using, the no of batch size and epoch size that we have set. Higher the batch size more faster it will train but will take more GPU memory but we must also consider that difference in batch size gives us difference in result so its always best to play around with different batch size, to get an optimal result.

Metrics and Results From Training
Metrics:
Confusion Matrix:
PR Curve:

Inference with the model

Let’s create a script to inference with our trained model to see how our model is performing:

from ultralytics import YOLO

#Load the custom model

model = YOLO(“path/to/custom/model”)#the custom model is saved inside/runs/detect/train/weight path

#Predict

model.predict(“Path/to/image”)

Here is the final result :

Applications of YOLO:

YOLO known for its speed is the best of its kind for real-time object detection. Some of the applications of yolo are:

1) Autonomous Vehicle

2) Traffic surveillance system

3)Surveillance and Security

4) Medical Imaging

5) Agriculture and environmental Monitoring

and many more

Conclusion

Here we have discussed what YOLO is and the ease of implementation of the detection model with its full on capabilities to tackle the real world vehicle number plate character detection problem. As the computer vision field is progressive exponentially day by day, I am very much excited to explore more on what YOLO has to offer with its newer versions models.

References:

1) https://docs.ultralytics.com/models/yolo11/

2) https://www.kaggle.com/datasets/inspiring-lab/nepali-vehicles-number-plate-dataset