Object detection is a building tool for most of the Deep Learning applications out there. Implementation of object detection can give us endless use cases. Mostly, we find object detectors in webcams, surveillance cameras, dashboard cameras or mobile phones and the list goes on. The main application of object detection is object classification and tracking.
One of the most fanatic uses of object detection is seen in football (soccer), where the position of players and football is monitored. The VAR technology implements such detections so as to make decisive calls for penalties, hand-ball, goal-line clearance or just to track the offside field.
In object detection, all these objects are bound by these rectangular frames which define the location and the object. The area of convergence of the object detection bounding to the real bounding is commonly known as Intersection over Union (IoU), and depends upon various parameters like feature extraction method, sliding window size, video quality, etc. Due to this, an IoU value over 0.5 is considered as a good detection. This value changes depending upon the severity of the situation.
YOLO — You Only Look Once
Let’s Get Started
Now I will show you how to set up your own video call interface on Agora.io and how to create an object detection model over it.
First, you will need a developer account on Agora.io and once you’re done signing up you will be redirected to the dashboard. Navigate to the Project Management tab and create a new project. Give a suitable name to your project and this will give you an APP ID which you will be using to reference Agora’s video call to your project.
│ agora.py │ agora_copy.py │ chromedriver.exe │ image.jpg │ imagenew.jpg │ new.py │ ObjectDetection.py │ requirements.txt │ resnet50_coco_best_v2.0.1.h5 │ test.png │ test_output.png │ __init__.py │ ├───frontend │ │ AgoraRTCSDK-2.6.1.js │ │ index.html │ │ │ └───.vscode │ launch.json │ ├───imagenew.jpg-objects │ bus-15.jpg │ car-12.jpg │ car-13.jpg │ car-14.jpg │ car-16.jpg │ car-17.jpg │ car-18.jpg │ car-19.jpg │ car-20.jpg │ car-21.jpg │ car-22.jpg │ car-23.jpg │ car-24.jpg │ car-25.jpg │ car-26.jpg │ car-28.jpg │ car-29.jpg │ car-30.jpg │ car-31.jpg │ car-33.jpg │ car-34.jpg │ car-35.jpg │ car-36.jpg │ car-37.jpg │ car-38.jpg │ motorcycle-32.jpg │ motorcycle-4.jpg │ person-1.jpg │ person-10.jpg │ person-11.jpg │ person-2.jpg │ person-3.jpg │ person-5.jpg │ person-6.jpg │ person-7.jpg │ person-8.jpg │ person-9.jpg │ truck-27.jpg │ └───test_output.png-objects person-1.jpg
The above code uses the Agora community SDK to integrate the video call and get frames out of the video call. In the second line, replace the app id field with the app ID you generated. Similarly, add the path to the chromium driver and give a suitable name to your channel.
binary_image.save(“filename”) extracts the image from the call and then saves it in your local directory (here, I have used test.png)
The two major functions of this code are displayed below.
The first function is to set up a video call with a relevant app ID and channel name. It also uses chromium driver which provides the capability for navigating to web pages on execution.
The second major function is to process the frames from the video call for object detection; for this I will be using COCO dataset, which has over 1.5 million object instances. These instances help in higher accuracy of images, providing an IoU between 0.5 to 1.
The above object detection model extracts all the detected objects from the image and saves it in a local directory named test_output.png-objects.
Here is a reference example:
After processing this image with the YOLO (trained on COCO dataset), it gives us the extracted objects and classifies them.
And as I mentioned, all the objects will be extracted as a separate image in a new folder.
There it is! We have successfully made an object detection model over Agora’s video call. You can refer the code for the above tutorial on my GitHub.