4 posts tagged with "PyTorch"

Project: General Hand Gesture Recognition

April 15, 2025 · 3 min read

Overview

This project aims to create a unified, semi-supervised contrastive-learning framework for hand gesture recognition. The framework is designed to adapt efficiently to various downstream tasks, such as human-computer interaction and sign language recognition, with minimal retraining or fine-tuning.

Scope and Applications

[!NOTE] This section is a summary generated from the report by Grok. The contents have been double-checked by the author.

Only this section covers the main content of the report and the remaining sections are about the details of setting up the project and the purpose of specific scripts within the repository.

Key Areas Explored

Static-Pose Representation Learning

Objective: Map hand landmark inputs (shape $21 \times 3$ ) into feature embeddings (size $128$ ).
Approach: Compared three encoder architectures:
- Multi-layer Perceptron (MLP)
- Graph Convolutional Network (GCN)
- Graph Attention Network (GAT)
Hypotheses Tested:
1. Graph-based models (GCN and GAT), which leverage edge information, outperform MLP in accuracy and convergence speed. This was evaluated using supervised contrastive loss on the Lexset dataset.
2. Incorporating a large unlabelled dataset (synthetic MANO data) with curriculum-based augmentations enhances model generalization.

Extension to Dynamic Gesture Recognition

Objective: Extend the contrastive learning approach to recognize dynamic gestures.
Approach: Utilize sequential architectures like Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) units to model temporal dependencies in gesture sequences.

Project: Vision Transformer Analysis

April 1, 2024 · 4 min read

This is the final project for the course AIST4010. More details on the project can be found in the report. This project is done in April 2024.

Report:

Overview

Project Goals

The project investigates the generalizability of Vision Transformers (ViTs) compared to Convolutional Neural Networks (CNNs) for small-scale computer vision tasks. While ViTs excel in large datasets, they struggle with smaller ones. This work evaluates and compares the performance of models like ResNet, ViT, DeiT, and T2T-ViT on classification tasks using small subsets of CIFAR-10 and STL-10 datasets.

Key Contributions

Scalability Analysis: Demonstrated performance degradation of ViTs with reduced dataset sizes, showing CNNs are more effective for small datasets.
Computational Efficiency: Analyzed training iterations and time-to-convergence, highlighting that ViTs, while converging faster, still lack efficiency due to lower accuracy on small datasets.
Comparison of Architectures: Implemented and trained models with similar parameter counts for fair performance evaluations.

Project: U-Net Segmentation

August 4, 2023 · 4 min read

Backup of Old Project (December 2023)

This is a backup of an old project focused on training a U-Net model from scratch for semantic segmentation from scratch on the Cityscapes dataset and Carvana dataset. The images are DOWNSCALED to speed up the training process for learning purposes.

Project: GAN Generation

August 1, 2022 · 2 min read

Backup of GAN Learning Project (August 2022)

[!NOTE] The project explores various GAN architectures and improvements through iterative versions.

[!IMPORTANT] This project is a personal learning exercise in understanding and implementing different GAN techniques.

This project re-implemented GAN, WGAN and conditional GAN and explored the typical problems that occurred with GAN-based architectures like mode collapse and sensitivity to hyperparameters.

Overview​

Scope and Applications​

Key Areas Explored​

Static-Pose Representation Learning​

Extension to Dynamic Gesture Recognition​

Overview​

Project Goals​

Key Contributions​

Backup of Old Project (December 2023)​

Backup of GAN Learning Project (August 2022)​

Overview

Scope and Applications

Key Areas Explored

Static-Pose Representation Learning

Extension to Dynamic Gesture Recognition

Overview

Project Goals

Key Contributions

Backup of Old Project (December 2023)

Backup of GAN Learning Project (August 2022)