How Computational Linguistics can be used to help the visually impaired to enjoy sports entertainment

4 min readJul 31, 2023

In 2019, Mike Kearney, a blind sports fan attended a Liverpool game with his cousin Stephen Garcia. Because Mike is blind his cousin commentated the game to Mike and explained what was happening throughout the game, this went viral on the internet: https://www.youtube.com/watch?v=6HysSJr9-54&ab_channel=LiverpoolFC

Complete Visual Impairedness affects 36 million individuals worldwide, and a lot of those could be possible sports fans just like Mike.

Using Computational Lingustics we have designed a project that can summarize basketball games by taking the real-time game tape and processing it into an AI-generated commentary that blind fans can use to still enjoy the exciting world of sports entertainment

Project Layout

Video Processing: Use a video processing library (e.g., OpenCV) to extract frames from the basketball game video.
Object Detection: Utilize an object detection model (e.g., YOLO, SSD, Faster R-CNN) to identify and track players, the ball, and other relevant objects in each frame. This will help you identify key events in the game.
Event Detection: Analyze the detected objects’ trajectories and interactions to identify key events such as shots, passes, dunks, steals, etc.
Action Recognition: Apply action recognition models (e.g., LSTM, CNN + LSTM) to understand the actions performed by players in each frame, such as dribbling, shooting, rebounding, etc.
Text Generation: Use natural language processing techniques, such as recurrent neural networks (RNNs) or transformers, to convert the detected events and actions into text summaries.
Post-Processing: Combine and format the generated text summaries to create coherent and readable basketball game summaries.

Project Syntax

Video Processing using OpenCV

Live Feed Footage can be fed into the program and it would read each frame and then the frames can be applied to the AI model

import cv2
def extract_frames(video_path):
    cap = cv2.VideoCapture(video_path)
    frames = []
    while cap.isOpened():
        ret, frame = cap.read()
        if not ret:
            break
        frames.append(frame)
    cap.release()
    return frames

Event Detection

Event detection in basketball games requires complex algorithms that analyze the trajectories and interactions between players and the ball. Common events to detect include shots, passes, dunks, steals, etc. This step is challenging and goes beyond a simple code snippet. It involves computer vision, motion analysis, and rule-based or machine-learning algorithms.

Action Recognition using a CNN + LSTM model:

import torch
import torch.nn as nn
import torch.optim as optim

class CNNLSTMModel(nn.Module):
    def __init__(self, num_classes):
        super(CNNLSTMModel, self).__init__()
        self.cnn = nn.Sequential(
            nn.Conv2d(in_channels, 16, kernel_size=3, stride=1, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2),
            # Add more conv layers as needed
        )
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)
    def forward(self, x):
        batch_size, timesteps, C, H, W = x.size()
        c_in = x.view(batch_size * timesteps, C, H, W)
        c_out = self.cnn(c_in)
        r_in = c_out.view(batch_size, timesteps, -1)
        r_out, (h_n, h_c) = self.lstm(r_in)
        output = self.fc(r_out[:, -1, :])
        return output
# Define hyperparameters
input_size = 256  # Replace with the actual size of the CNN output
hidden_size = 128
num_layers = 2
num_classes = 10  # Replace with the number of basketball actions/classes
# Initialize the model
model = CNNLSTMModel(num_classes)
# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Text Generation using a pre-trained language model

Generating text summaries using models like GPT-3 requires API access to the specific model. Since GPT-3 is not an open-source model, I can’t provide an implementation here. However, you can check the documentation of the GPT-3 API provided by OpenAI or other similar language models like GPT-2 for generating text summaries based on the detected events and recognized actions.

Remember, building and training each of these models for a real-world application requires extensive data collection, preprocessing, and fine-tuning to achieve accurate and meaningful results. Additionally, integrating all these components and creating a complete basketball game summarization system requires substantial engineering effort.

from transformers import GPT2LMHeadModel, GPT2Tokenizer

def generate_text(prompt, model_name='gpt2', max_length=100, temperature=1.0):
    # Load the pre-trained GPT-2 model and tokenizer
    tokenizer = GPT2Tokenizer.from_pretrained(model_name)
    model = GPT2LMHeadModel.from_pretrained(model_name)

    # Tokenize the prompt text
    inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")

    # Generate text based on the provided prompt
    outputs = model.generate(
        inputs,
        max_length=max_length,
        temperature=temperature,
        pad_token_id=tokenizer.eos_token_id,
        num_return_sequences=1
    )

    # Decode the generated text and return it
    generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return generated_text

# Example usage:
prompt_text = "The basketball game was intense. The home team made a comeback"
generated_text = generate_text(prompt_text)
print(generated_text)

In this example, the generate_text function takes a prompt as input and uses the GPT-2 model to generate text based on that prompt. The model_name argument can be adjusted to use different versions of GPT-2 with varying sizes and capabilities.

Keep in mind that text generation with language models like GPT-2 can sometimes produce outputs that might not make sense or could be off-topic. Adjusting the max_length and temperature parameters can help control the output length and randomness of the generated text. Lower temperature values (e.g., 0.7) make the output more focused, while higher values (e.g., 1.5) make it more creative but potentially less coherent.

Additionally, if you want more advanced text generation, you can explore other language models like GPT-3, which provides even more capabilities for generating human-like text. However, using GPT-3 requires access to the OpenAI GPT-3 API.

Conclusion

This is a simple project showing a unique and practical application of NLP and Computer Vision. Subscribe for more similar content!