Skip to main content

10 posts tagged with "Project"

View All Tags

Project: General Hand Gesture Recognition

· 3 min read

View Project Report

PyTorch Computer Vision Contrastive Learning Duration

Overview

This project aims to create a unified, semi-supervised contrastive-learning framework for hand gesture recognition. The framework is designed to adapt efficiently to various downstream tasks, such as human-computer interaction and sign language recognition, with minimal retraining or fine-tuning.

Scope and Applications

[!NOTE] This section is a summary generated from the report by Grok. The contents have been double-checked by the author.

Only this section covers the main content of the report and the remaining sections are about the details of setting up the project and the purpose of specific scripts within the repository.

Key Areas Explored

Static-Pose Representation Learning

  • Objective: Map hand landmark inputs (shape 21×321 \times 3) into feature embeddings (size 128128).
  • Approach: Compared three encoder architectures:
    • Multi-layer Perceptron (MLP)
    • Graph Convolutional Network (GCN)
    • Graph Attention Network (GAT)
  • Hypotheses Tested:
    1. Graph-based models (GCN and GAT), which leverage edge information, outperform MLP in accuracy and convergence speed. This was evaluated using supervised contrastive loss on the Lexset dataset.
    2. Incorporating a large unlabelled dataset (synthetic MANO data) with curriculum-based augmentations enhances model generalization.

Extension to Dynamic Gesture Recognition

  • Objective: Extend the contrastive learning approach to recognize dynamic gestures.
  • Approach: Utilize sequential architectures like Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) units to model temporal dependencies in gesture sequences.

Project: Named Entity Recognition

· 4 min read

View Project Final Report Project Description

Overview

This project aims to explore methodologies related to Named Entity Recognition.

Methodology

The project evaluated three distinct deep learning architectures for Named Entity Recognition (NER):

  • SpaCy: (Responsible: Square) Explored the key approaches used in the SpaCy library, including hash embeddings and trasition-based models.
  • BERT (Bidirectional Encoder Representations from Transformers): (Responsible: Sam/Ash) Fine-tuned the bert-base-cased model over NER task, with following modifications:
    • Aggregation Strategies: Improved aggregation method to make sure that each input word is not split into parts in the output.
    • Masking: Masked by chance 15% of named entities in the training data and adjusted the leraning rate and number of epochs for the fine-tuning process to test if encouraging the utilization of nearby context could improve the generalizability over unseen named-entities.
  • Gemma 3 (Decoder LLM): (Responsible: Square) Tested the performance by using decoder LLMs on NER tasks, guided using prompt engineering. Ensured JSON output. Evaluated on both zero-shot (only instructions) and few-shot (instructions plus examples) cases.

For more experimental data, please see the report for more information.

Project: P2P Communication App

· 5 min read

View Project

Last Commit

A fork of the project in the semester 2023-24 Term 2, creating a peer-to-peer communication app supporting audio recording, waveform display and editing, and also screen share function.

Key Features

  • Peer-to-peer communication within local area network
  • Functions:
    • Create, join and leave chat rooms
    • Audio recording
    • Screen sharing
  • Synchronization and handling of audio and video streams from multiple users so that they do not hear their own voices

Project: Vision Transformer Analysis

· 4 min read

View Project Report

Python PyTorch UNet ResNet ViT DeiT T2T Dataset | CIFAR10 Dataset | STL10 Dataset | Cityscapes Last Update | April 2024

This is the final project for the course AIST4010. More details on the project can be found in the report. This project is done in April 2024.

Report: Report


Overview

Project Goals

The project investigates the generalizability of Vision Transformers (ViTs) compared to Convolutional Neural Networks (CNNs) for small-scale computer vision tasks. While ViTs excel in large datasets, they struggle with smaller ones. This work evaluates and compares the performance of models like ResNet, ViT, DeiT, and T2T-ViT on classification tasks using small subsets of CIFAR-10 and STL-10 datasets.

Key Contributions

  1. Scalability Analysis: Demonstrated performance degradation of ViTs with reduced dataset sizes, showing CNNs are more effective for small datasets.
  2. Computational Efficiency: Analyzed training iterations and time-to-convergence, highlighting that ViTs, while converging faster, still lack efficiency due to lower accuracy on small datasets.
  3. Comparison of Architectures: Implemented and trained models with similar parameter counts for fair performance evaluations.

Project: Deep Q-Learning Agent

· 2 min read

View Project Report

Python Gymnasium Reinforcement Learning Group Project Last Updated

Overview

  • Created a Gym environment of a simple third-person shooter game in Python
  • Implemented a simple Deep-Q Network with PyTorch to train agents to master at the game (left image)
  • Fine-tuned the hyperparameters of the agent, achieving average kill streak of 7 (right image, top) and lengthend the survival duration by 4 times (right image, bottom), which significantly better than the random baseline of 0.22 kills on average.
  • Explored how deep-Q learning models handle a variable quantity of moving objects, i.e. the bullets and enemies, and relevant adjustments to the reward functions and representations of the observation space needed.

Report: Report

Project: GAN Generation

· 2 min read

View Project

Python PyTorch Generative Adversarial Networks MNIST Dataset Last Updated: August 2022

Backup of GAN Learning Project (August 2022)

[!NOTE] The project explores various GAN architectures and improvements through iterative versions.

[!IMPORTANT] This project is a personal learning exercise in understanding and implementing different GAN techniques.

This project re-implemented GAN, WGAN and conditional GAN and explored the typical problems that occurred with GAN-based architectures like mode collapse and sensitivity to hyperparameters.