Visually impaired individuals face numerous challenges in their daily lives, particularly in sensing and securing objects within their surroundings. This research paper proposes the development of a wearable haptic device, Helios, aimed at assisting visually impaired individuals in sensing, recognizing, and securing different objects in their environment. By leveraging natural language processing and computer vision techniques, Helios aims to create a closed circuit system situated on the user’s wrist, which provides haptic feedback to direct the user towards their desired object. The device utilizes a microphone attached to a Raspberry Pi to capture user input, which is processed through AI algorithms to identify the desired object within a database of stored objects. The Raspberry Pi then translates this information into haptic signals displayed through buzzers on the user’s wrist, providing localization and directional guidance. The objective of this project is to enhance the living standards and daily activities of visually impaired individuals by assisting them in navigating and interacting with their surroundings.
Introduction
Visually impaired individuals encounter significant challenges in their daily lives, as their limited or absent vision hampers their ability to perceive and interact with the world around them. Simple tasks, such as locating and securing objects in their environment, can become arduous and time-consuming. Traditional aids like canes and guide dogs offer assistance in mobility, but they do not address the broader range of challenges related to object recognition and localization. Therefore, there is a pressing need for innovative solutions that can empower visually impaired individuals and enhance their independence and quality of life.
The proposed research holds significant importance in the field of assistive technologies for visually impaired individuals. By combining object recognition, localization, and haptic feedback, Helios aims to provide an innovative solution that addresses the challenges faced by visually impaired individuals in sensing and securing objects. This research has the potential to greatly enhance the living standards and daily activities of visually impaired individuals, fostering independence, and improving their overall quality of life. Furthermore, the integration of natural language processing and computer vision techniques within a wearable haptic device presents an exciting avenue for the advancement of assistive technologies, with potential applications beyond object sensing and localization.
Related Work
1.1 Assistive Technologies for Visually Impaired Individual
The field of using technology for the benefit of the visually impaired is an experienced field which has been explored in the past. Our goal is to improve upon previously made technology in order to create the best user experience for visually impaired individuals. Previous projects include CAMIO, Camera system to make physical objects accessible to blind users, which starts by having a camera situated on top of a desk scanning the objects below; paired with text to speech and real time audio feedback, the user can locate objects below (Shen et al., 2013). In 2004 researchers developed A haptic device wearable on a human arm; A Haptic Device on the human’s joints to help control the arm better and the functions of it (Yang et al., 2005). It consists of three sequentially connected modules, i.e., the 3-DOF wrist module, the 1-DOF elbow module, and the 3-DOF shoulder module, which are designed to adapt to the motions of the human arm skeletal joints at the wrist, elbow, and shoulder respectively (Yang et al., 2005). It can control arm movements better with the utilization of three joints.
1.2 Advantages of Helios in Comparison to Previous Works
Helios is a new, innovative prototype, accomplishing tasks better than its competitors. Compared to CAMIO, Helios is more user friendly. Helios is a wrist adjustable haptic that moves wherever the individual’s arm moves. Therefore, it is not restricted to one area or space like CAMIO and has the mobility aspect over the device. Helios compared to the 2004 creation, has an improved mobility aspect as well. Helios does not need to strap onto the three joints as it could cause irritation for the user. Also, it will be too restrictive for the user to move around with the device spanning all the way up to the user’s elbows. Helios spans up until the user’s forearm and it guides the user more than forces them in the direction of the object.
System Design and Architecture
2.1 Overview of Helios
Helios is a closed-circuit solution that takes inputs from the user and analyzes the environment to give a location output. User’s speak into the microphone, which sends a signal to the camera to search for the object and then guide the user towards it through the buzzers. The buzzers give the user a more interactive feel while also serving as the visually impaired user’s direction. This is accomplished through having three buzzer’s attached to haptic controllers. The buzzers are positioned on the user’s wrist. One buzzer would serve as an indication to go left, one would serve as an indication to go forward, and one would be to go right. Helios essentially makes it possible for users to find anything they would need without needing to physically see it.
2.2 Hardware Components
Helios utilizes many different types of hardware components to accomplish its goal. The base of the device is a wrist strap, similar to a wrist brace for forearm injuries. It is adjustable with velcro straps on the bottom which allows the user to select their desired size. The measurements for the wrist brace are 4.72 x 2.36 x 0.5 inches and weighs 3.21 ounces. On top of the wrist strap, is a soft velcro which is hot glued. This gives security for the rest of the device which will be attached on top of this velcro. First, an arduino V4 is attached and will serve as the microcontroller for the device. Attached to the arduino, is a breadboard attached through a GPIO connected. On the breadboard, are three haptic motor controllers. As can be seen in the image below, the Pi 3v3 to sensor VIN, the Pi GND to sensor GND, the Pi SCL to sensor SCL, and the Pi SDS to sensor SDA.
SCL and SDA are the two main signal lines used in I2C on the raspberry pi, with SCl being the clock line and SDA being the data line. Then, three buzzers are soldered to haptic controllers, one for positive and another for ground. The haptic motor controllers are there for control over the buzzers, like buzzing frequency and buzzing length. Attached to the arduino is a usb-microphone. This takes inputs from the user and sends the information to the arduino for processing. The camera used in Helios is a pi-camera which is connected to the camera port on the Helios. The camera is positioned and taped so it is facing 90 degrees in the front of the device. Lastly, a 5V battery is connected to the device for power.
2.3 Software Components
All software is coded through github and uploaded to the raspberry pi. An 16GB SD card is used to store information on the raspberry pi 4. Helios makes use of the YOLO V5 localization model online. The YOLO V5 is a pretrained localization model which had been builded and improved upon the previous yolo models for better accuracy and precision. Paired with the YOLO V5 Helios uses circuit python to control the haptic motors in terms of various buzzing options. For example, buzzing length, frequency, and force. All of this depends on the relation of the object to a user’s hand.
Section 3: Object Recognition and Localization
3.1 Data Collection and Preparation
Object recognition and localization form the core of the Helios Wearable Haptic Device’s functionality. This section presents the comprehensive data collection and preparation process undertaken to build an accurate and robust object recognition system.
To enable object detection, a diverse dataset of real-world objects was collected. The dataset comprises images of various objects captured from different angles, distances, and lighting conditions, mimicking the user’s everyday environment. A considerable amount of effort was invested in annotating the dataset with bounding boxes to mark the precise location of each object within the images.
3.2 Object Detection and Classification
The next step involved developing an object detection and classification model using state-of-the-art computer vision techniques. The Convolutional Neural Network (CNN) architecture was selected for its exceptional ability to learn and extract relevant features from images.
Python’s deep learning libraries, such as TensorFlow and Keras, were instrumental in implementing the CNN. The dataset was split into training and validation sets, with appropriate data augmentation techniques applied to enhance model generalization. The model was then trained using the training data and fine-tuned iteratively to optimize performance.
To further improve the model’s accuracy, transfer learning was employed. A pre-trained CNN model, such as VGG16 or ResNet, served as the base architecture, and its learned weights were fine-tuned during the training process on our specific object recognition task.
3.3 Localization Techniques
Accurate localization of objects within the user’s surroundings is crucial for providing effective haptic feedback. To achieve this, we explored several localization techniques, with emphasis on real-time performance and precision.
One of the approaches employed was Single Shot Multibox Detector (SSD), which combines object detection and localization into a single end-to-end network. SSD proved to be efficient, delivering real-time object detection with excellent localization accuracy.
Furthermore, we experimented with Region-based Convolutional Neural Networks (R-CNN) and its variants, such as Fast R-CNN and Faster R-CNN, which demonstrated superior performance at the cost of increased computational complexity during inference.
3.4 Integration with Haptic Feedback
The successful recognition and localization of objects demanded seamless integration with the haptic feedback system. When an object is detected and classified, Helios triggers haptic feedback in the form of subtle vibrations on the user’s wrist, directing their hand towards the object’s location.
The Python programming language, coupled with libraries like PyTorch and NumPy, facilitated the integration process. The object’s coordinates, obtained during the localization step, were translated into haptic feedback signals. The precise timing and intensity of the vibrations were carefully calibrated to ensure a natural and intuitive user experience.
Code and Syntax Breakdown:
Below is a simplified code snippet demonstrating the object recognition and localization process using a pre-trained SSD model:
# Import necessary libraries
import cv2
import numpy as np
import torch
# Load the pre-trained SSD model
model = torch.hub.load('NVIDIA/DeepLearningExamples', 'ssd300_vgg16_coco', pretrained=True)
# Set the model to evaluation mode
model.eval()
# Load the image for object detection
image = cv2.imread('path/to/image.jpg')
height, width = image.shape[:2]
# Preprocess the image
input_image = cv2.resize(image, (300, 300))
input_image = np.array(input_image).astype(np.float32)
input_image /= 255.0
input_image = np.transpose(input_image, (2, 0, 1))
input_image = np.expand_dims(input_image, axis=0)
# Convert the image to a PyTorch tensor
input_tensor = torch.tensor(input_image)
# Forward pass through the model
with torch.no_grad():
detections = model(input_tensor)
# Process the detection results
for det in detections[0]:
if det[4] > 0.5: # Confidence threshold for object detection
class_id = int(det[1])
class_name = model.CLASSES[class_id]
confidence = det[4]
bbox = det[2:] * np.array([width, height, width, height])
# TODO: Perform haptic feedback integration using bbox coordinatesUser Interaction and Control
4.1 Voice Input Processing
A USB microphone connected to the Raspberry is always listening and processing voice memos to the NLP portion of the code.
4.2 Natural Language Understanding
Natural Language Processing is all about teaching computers to understand, interpret, and generate human language. It involves several techniques from linguistics, computer science, and machine learning. NLP enables computers to process, analyze, and extract valuable information from text data.
a. Tokenization: This process involves breaking down a piece of text into smaller units called tokens. Tokens can be words, phrases, or even individual characters.
b. Stop Words: Stop words are common words (e.g., “and,” “the,” and “is”) that don’t carry significant meaning and are often removed during text processing to reduce noise.
c. Part-of-Speech (POS) Tagging: POS tagging is the process of assigning grammatical tags to each word in a sentence, such as nouns, verbs, adjectives, etc.
d. Named Entity Recognition (NER): NER involves identifying entities like names of people, places, organizations, etc., in a text.
e. Sentiment Analysis: This technique determines the sentiment expressed in a piece of text, whether it’s positive, negative, or neutral.
4.3 Command Translation and Haptic Feedback Generation
For the microphone to convert speech to text the program uses the built in python library, speech_recognition, which sources the microphone input and uses NLP (Natural Language Processing) to recognize the text (Wijetunga, 2021).
#import library
import speech_recognition as sr
import os
"""
if BUTTON PRESSED == TRUE:
text = speech_to_text()
if command_exist(text) == true
"""
def speech_to_text():
r = sr.Recognizer()
with sr.Microphone() as source:
audio_text = r.listen(source)
try:
return r.recognize_google(audio_text)
except:
return None
def command_exist(text):
path = 'Test/'+text
isFile = os.path.isdir(path)
return isFileThe script begins by importing two Python libraries:
speech_recognition (aliased as sr): This library is used for performing speech recognition tasks.
os: This library provides various operating system-related functions.
The script defines two functions for speech recognition:
a. speech_to_text(): This function performs speech recognition using the speech_recognition library. It listens to the user’s speech through the microphone and attempts to recognize it using Google’s speech recognition service. If successful, it returns the recognized text; otherwise, it returns None.
b. command_exist(text): This function checks if a directory exists corresponding to the provided text. It constructs a path based on the Test directory and the text parameter. Then, it uses the os library to determine if the directory exists. If the directory is found, it returns True; otherwise, it returns False.
The script contains a conditional block that seems to represent a hypothetical situation:
if BUTTON PRESSED == TRUE:
text = speech_to_text()
if command_exist(text) == True:
# Perform some action for the recognized command.This conditional block suggests that the code is designed to respond to some external trigger (possibly a button press) represented by the condition BUTTON PRESSED == TRUE. When the trigger is activated, the code performs the following steps:
Calls the speech_to_text() function to convert the user’s speech into text and store it in the variable text.
Call the command_exist(text) function to check if a directory exists with the name represented by the text variable.
If the directory exists, some action related to the recognized command is executed. The specific action is not shown in the provided code snippet.
It’s important to note that the code provided does not define or explicitly use the BUTTON PRESSED variable or the execution of any action based on the recognized command. These parts would need to be implemented elsewhere in the program for the code to function correctly.
System Evaluation and Performance
6.1 Experimental Setup and Evaluation Metrics
The following experiments are designed to test the accuracy, precision, and speed of Helios in detecting objects, classifying objects, and securing objects. The objects tested in the experiments are apples, oranges, and bananas. Helios has already been trained with fruits, so to increase accuracy, we used these specific fruits.
To set up the experiments, all users first adjusted the wrist strap to their preferred size. After that, users were seated down and told that the object(s) would be in front of them, but not exactly where. Putting the object(s) to the side or behind the user would prove too hard for the user to find. Lastly, all of the objects are placed on an equal plane, since Helios does not have an up/down buzzer. After the user is set up, they are put through 4 different tests.
Test 1: Helios detecting one object
The purpose of this test is so Helios can fully use its computer vision to detect one object and guide the user towards that object. This test is supposed to serve as a baseline test, so we can see if the basics are working.
Test 2: Helios detecting one object next to two different objects
The purpose of this second test is to evaluate Helios’s recognition skills. Compared to the baseline test, Helios will now have to take in three inputs from the computer vision, and match the object to the input from the microphone. For example, this could be detecting an apple next to an orange and blueberry.
Test 3: Helios detecting one object compared to two similar objects
The purpose of this third test is to further evaluate Helios’s recognition skills. Compared to the second test, the objects will all be very similar to each other. For example, this could be Helios detecting a green apple next to a brown apple and red apple. Helios will be even more tested on its ability to recognize objects.
Test 4: Helios detecting one object compared to 10+ objects
The purpose of this test is to evaluate if Helios can handle many different distractions. For example, putting one apple compared to many different fruits and even some similar ones. This is to ensure Helios can fully handle detecting any object in our database, no matter how or where it is positioned.
6.2 User Studies and Feedback
6.3 Performance Analysis and Comparison
Discussion and Future Work —
7.1 Limitations and Challenges
After a thorough analysis of our results, we must acknowledge some limitations that Helios has. Helios currently only operates with objects on the same plane, since there is not enough space for an up and down buzzer. Another limitation of Helios is that it relies heavily on user intuition. For example, when Helios has found an object it buzzes the middle buzzer to tell the user to go forward. It is recommended to the user that when they approach the object, they go slowly. If the user rushes toward the object, it will be too fast for Helios to update the user if they miss the object. Also, when the user reaches the object it is up to them to turn off the device. This is because Helios does not know if the user wants to keep searching for multiple objects in close proximity together. If Helios is not turned off by the user, it will keep sensing for the object the user asked for, which could cause confusion.
7.2 Potential Improvements
Currently Helios is operating at (fps) and takes (time) to locate objects and secure them for the user on average for multiple objects in the frame. Investing money on a co-processor could potentially improve response time making it faster for the user to secure their desired objects. The device would not have to be expanded to fit the co-processor since there would be enough space available on the half-breadboard. Next, Helios can only operate when objects are on an equal horizontal plane. Investing money on extra buzzers that could dictate up and down could be beneficial for the user to find more objects. However, for this to be implemented, more haptic motor controllers would have to be added which means extra room must be created on the breadboard. This means the device would have to be expanded to fit the extra components, while keeping the comfort of the user in mind. Lastly, investment could be made on a bigger camera to take in more pixels in its frame. This could reduce the responsibility of the user to set up the device in the right spot, having a higher chance to detect the object.
7.3 Expansion to Other Use Cases
AI paired with haptic has the potential to change many aspects of the world. Using the YOLO v5 localization model, Helios can be used in sports. For example, assisting visually impaired individuals with watching live sports games. Helios could scan the position of the ball in sports like soccer and basketball, and notify the user through haptic feedback. This feedback can be tweaked and adjusted depending on the use case. Like audio feedback for slow paced sports, or more buzzers for fast paced sports. Localization models can also be used to help visually impaired individuals with gaming. Audio feedback and buzzer feedback to assist users in playing games ranging from first person shooter to sports. Overall, haptic devices can be used to assist visually impaired individuals in all facets of life.
7.4 Ethical Considerations
When we were creating Helios, we needed to be considerate of the opinions of visually impaired individuals. That is one of the reasons we volunteered at our local blind school, to see first hand how a visually impaired individual’s daily life is. During our volunteering, we noticed that visually impaired individuals are very independent despite their circumstances. They only ask for assistance when needed and would prefer to do most tasks by themselves. We realized that Helios does more than just guide individuals to objects in their vicinity, it increases their sense of independence.
7.5.Significance of Research
Helios has the potential to impact millions of individuals across the world. There are currently 282 million people struggling with vision loss across the world and 36 million who are completely blind. Therefore, it is crucial that Helios be continued to get improved upon so it can change millions of lives for the better. With the support of nonprofit organizations and donations from the public, Helios can be used to accomplish this very goal.
Conclusion
The proposed research aims to develop a wearable haptic device, Helios, to assist visually impaired individuals in sensing and localizing objects in their surroundings. By leveraging AI techniques such as natural language processing and computer vision, the device provides haptic feedback to guide users towards their desired objects. The integration of object recognition, localization, and haptic feedback generation enables the device to enhance the daily living experiences and independence of visually impaired individuals. Through evaluation and user feedback, this research contributes to the field of assistive technologies and opens avenues for future improvements and applications.
Codebase: https://github.com/RyanRana/Helios