2019; Nigahein (Electronics Club)

Guining Pertin
Apr 3, 2019
7 min read

Updated: Oct 7, 2025

Introduction

Ever got ready with those big packs of popcorn and cold drinks, sat down comfortably on your sofa for a Netflix marathon or maybe, you just went to your comfy bed after a hard day’s work, but just realized you forgot to turn off the lights? What if you could control those annoying lights or maybe turn on your heater while you stay comfy in your bed with just your gaze? Thanks to the rise of IoT and advancements in Computer Vision, these ideas are no longer just fantasies.

This project is based on Python language, Dlib and OpenCV libraries

This project was done as a part of TechEvince 2019 – Annual Technical Exhibition, IIT Guwahati.

TL;DR

Detect facial features using HOG-SVM detector.

Identify eye features and use it to find pupil center.
Track pupil motion using image processing algorithms.
Map this pupil position as the eye region of focus.
Identify items within focus and use Arduino to control switches based on time based thresholding.

The Idea

We first determine the user’s face landmarks(like lips, eyes, nose edges) to determine the pupil location using a webcam facing the user. Another webcam facing forward gives the user’s field of view. By mapping the pupil location we get the user’s current region of focus. We then determine the IoT device location and when it’s location and region of focus align for more than a second, we consider it as a signal to flip the switch.

NB: Most of the detailed explanations are given in the comments in the code

So, the whole project can be divided into the following steps:

Pupil tracking and region of focus
Device detection and recognition
Control signal

Preliminaries

The project would need the following libraries to be installed-

Imutils – provides functions for easy image processing operations
OpenCV and OpenCV-contrib module – probably the best computer vision library available for Python
Numpy – package for scientific computation in Python
Dlib – toolkit for machine learning and data analysis applications
PySerial – library to communicate with the serial ports(here with Arduino)

Installation can be easily done using pip since all of them are available on PyPI

# imutils
$ pip install imutils
# opencv
$ pip install opencv-python
# opencv-contrib
$ pip install opencv-contrib-python
# numpy
$ pip install numpy
# dlib
$ pip install dlib
# pyserial
$ pip install pyserial

Step 1: Pupil Tracking and region of focus

Pupil tracking

We have used Python’s Dlib library to track the pupil location using a webcam(Cam1) facing the user.

We use the HOG(Histogram of Oriented Gradients) and SVM(Support Vector Machine) based face detector to detect the face (The working of the detector is beyond the scope of this blog post).
We then use a face landmark predictor to determine face landmarks on the detected face.
Select the landmarks corresponding to the eye and crop out the eye(36 to 41).
Since the pupil center will be of minimum intensity, it can be determined by finding the corresponding point.

# Face detection is performed using the following function
detector = dlib.get_frontal_face_detector()
detected_faces = detector(image, 1) # image in grayscale
# Landmark detection
predictor = dlib.shape_predictor('shape_predictor_68_face_landmarks.dat') # path to the file
landmarks = predictor(gray, rect) # on the grayscale image in the rectangle bounding the face
# Get minimum intensity point along x
# img is the cropped eye
# Set up the max possible along each point on y axis as initial guess
min_sum = 255 * len(img) # len(img) returns length of y
# Set the last point along x axis as the initial guess for minimum intensity point
min_index = -1
# Loop through each point on x axis
for x in range(len(img[0])): # len(img[0]) returns length of x
 # Set up the temporary sum used for sorting
 temp = 0
 # Loop through each point on the y axis
 for y in range(len(img)):
  # Take the total sum along the axis
  temp += img[y][x]
 # Sort and take the minimum one wrt to previous guess
 if temp < min_sum:
  min_sum_y = temp
  min_index = x
 
# Same can be done for the minimum along y

Region of focus

We get the user’s field of view using a front facing webcam(Cam2) attached to the system.
Divide the input image into different smaller sections(Here, sectors in polar coordinates).
Map the pupil location wrt to the eye size to the coordinates in image size, starting at the center.
The mapped coordinates in a particular section corresponds to the current region of focus.


# Function to map values from the eye to the user's view
def mapper(value, leftMin, leftMax, rightMin, rightMax):
 leftSpan = leftMax - leftMin
 rightSpan = rightMax - rightMin
 # mapped_value = new_min + (current_value - current_min)* rightspan/leftspan
 scaling = float((value - leftMin) / leftSpan)
 new_value = rightMin + rightSpan*scaling
return new_value 
# Function to convert from x-y coordinates to polar coordinates
def conv2polar(coor):
    r = int(np.sqrt(coor[0]*coor[0] + coor[1]*coor[1]))
    theta = int(np.arctan2(coor[1], coor[0])*180/3.14)
    polar = (r, theta)
    return polar
# Function to select sectors in image
# We first convert the x-y coordinates to polar coordinates and start the sector from -22.5 since it provided better accuracy
def sector(theta):
    sec = 0
    if (theta > -22.5) and (theta < 22.5): sec = 1 #sector1
    elif (theta > 22.5) and (theta < 67.5): sec = 2 #sector2
    elif (theta > 67.5) and (theta < 112.5): sec = 3 #sector3
    elif (theta > 112.5) and (theta < 157.5): sec = 4 #sector4
    elif (theta > 157.5) and (theta < 180): sec = 5 #sector5
    elif (theta > -180) and (theta < -157.5): sec = 5 #sector5
    elif (theta > -157.5) and (theta < -112.5): sec = 6 #sector6
    elif (theta > -112.5) and (theta < -67.5): sec = 7 #sector7
    elif (theta > -67.5) and (theta < -22.5): sec = 8 #sector8
    return sec

Step 2: Device detection and recognition

We have used ArUco markers(to be attached to the device) provided in OpenCV-contrib module which can be easily detected using built-in function to detect and recognize the device.

Device detection

ArUco has several dictionaries of markers, so we selected the 6×6 markers for our use.
On detection, it returns the corner points, ids and the rejected possible markers.
We find the center of the detected marker, convert it to polar coordinates and use it to determine the section it is in.
The IDs returned can be used to distinguish different devices by attaching different markers to different devices.

# ArUco marker detection
# Set up the dictionary for Aruco marker
aruco_dict = aruco.Dictionary_get(aruco.DICT_6X6_250)
# Get the params for the dictionary
params = aruco.DetectorParameters_create()
# Get the corners and IDs
corners, ids, rejected = aruco.detectMarkers(gray, aruco_dict,parameters = params) # take the grayscaled image of user_cam as input
# Draw the marker for visualization
detected = aruco.drawDetectedMarkers(frame, corners) # frame is the RGB input from the user_cam
# If at least one marker is detected
if np.all(ids != None):
 # Find the center and draw a circle for representation
 marker_x = int((corners[0][0][0][0] + corners[0][0][2][0])/2)
 marker_y = int((corners[0][0][0][1] + corners[0][0][2][1])/2)
 cv2.circle(detected, (marker_x, marker_y), 5, (255,0,0), -1)
 # Convert to polar and find its sector
 marker_coor = (marker_x - int(control_shape[1]/2), marker_y - int(control_shape[0]/2)) # control_shape is the overall shape of the user_cam input
 marker_polar = conv2polar(marker_coor)
 marker_sector = sector(marker_polar[1])

Step 3: Control signal

When the region of focus and the device location fall in the same sector, we assume that the user is focusing on the device. We turn on the timer and if the user focuses for more than a second, it is considered as a signal for flipping the switch. A binary output is transmitted to the Arduino using serial and used to turn on or off a switch

Determining and sending control signal

# Set up the port to communicate to
arduino = serial.Serial('/dev/ttyACM0', 9600)
if eye_sector == marker_sector:
 if delay == 0: delay = time.time()
 # Determine the time elapsed
 check = time.time() - delay
 print('Watching')trigger
 # If focused for more than a second
 if check > 1:
  # Flip Switch
  if trigger == 0: # Off before
   trigger = 1
   print('Turning On')
  elif trigger == 1: # On before
   trigger = 0
   print('Turning Off')
  arduino.write(struct.pack('>B', trigger))
  # Reset the time elapsed
  delay = 0
# Reset the time elapsed
else: delay = 0

# Arduino code for control
int input = 0;
void setup() {
   pinMode(13, OUTPUT);
   Serial.begin(115200);
   digitalWrite(13, LOW);
}
 
void loop() {
   if (Serial.available() > 0) {
    input = Serial.read();
    Serial.print("Received: ");
    Serial.println(input);
    if (input == 1) digitalWrite(13, HIGH);
    else if (input == 0) digitalWrite(13, LOW);
   }
}

Notes

https://www.youtube.com/watch?v=XP9fgSdPOyM

Full code provided here : https://github.com/otoshuki/Nigahein/

Inspired by the Eye of Horus Project : hackster-eye_of_horus

We tried out different methods for the project but many of them failed or didn’t meet our use case and hence we finally settled with the method stated above

Using haar-cascade for eye detection and then HSV or grayscale threshold methods to extract the pupil region. Then using contour detection we tried to determine pupil location. But the method was really inaccurate.
Same problem occured for hough transform instead of contour detection
Directly using PyGaze library. Unfortunately, we couldn’t use the library and faced some unknown issues.
We tried to train a haar cascade to detect the lamp, but it didn’t work due to less data. Hence we used the ArUco markers for easy detection.

Improvements:

The current version is really clunky, a lot of parts can be 3D printed and miniaturized.
Face detection can be performed using CNNs which can detect faces even at different angles although it may run slow on older systems.
Haar cascades can be trained to to detect the devices.
IR LEDs could be used to illuminate the eyes.
IR LEDs can be attached to the devices, glowing at particular rates to determine the object even in absence of visible light.
Only eye tracking can also be employed without the need of face detection, which can further help in miniaturization.

Future Prospects

With the rise of IoT devices and improvements in computer vision and computational efficiency, a lot of devices will be connected to the cloud. And with many people going for human augmentation, it such devices will be commonplace in the future. The applications of such a system could be limitless, ranging from unlocking cars and doors to controlling cars using pupil tracking. The rise of mixed reality systems will also use such systems to improve immersion.

The Team. Starting from left: Aadi, Guining, Aadarsh, Shridam

Members

Aadi Gupta
Aadarsh Khandelwal
Shridam Mahajan
Guining Pertin (Mentor)