10 posts tagged with "Project"

Project: General Hand Gesture Recognition

April 15, 2025 · 3 min read

Overview

This project aims to create a unified, semi-supervised contrastive-learning framework for hand gesture recognition. The framework is designed to adapt efficiently to various downstream tasks, such as human-computer interaction and sign language recognition, with minimal retraining or fine-tuning.

Scope and Applications

[!NOTE] This section is a summary generated from the report by Grok. The contents have been double-checked by the author.

Only this section covers the main content of the report and the remaining sections are about the details of setting up the project and the purpose of specific scripts within the repository.

Key Areas Explored

Static-Pose Representation Learning

Objective: Map hand landmark inputs (shape $21 \times 3$ ) into feature embeddings (size $128$ ).
Approach: Compared three encoder architectures:
- Multi-layer Perceptron (MLP)
- Graph Convolutional Network (GCN)
- Graph Attention Network (GAT)
Hypotheses Tested:
1. Graph-based models (GCN and GAT), which leverage edge information, outperform MLP in accuracy and convergence speed. This was evaluated using supervised contrastive loss on the Lexset dataset.
2. Incorporating a large unlabelled dataset (synthetic MANO data) with curriculum-based augmentations enhances model generalization.

Extension to Dynamic Gesture Recognition

Objective: Extend the contrastive learning approach to recognize dynamic gestures.
Approach: Utilize sequential architectures like Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) units to model temporal dependencies in gesture sequences.

Project: Named Entity Recognition

April 14, 2025 · 4 min read

Overview

This project aims to explore methodologies related to Named Entity Recognition.

Methodology

The project evaluated three distinct deep learning architectures for Named Entity Recognition (NER):

SpaCy: (Responsible: Square) Explored the key approaches used in the SpaCy library, including hash embeddings and trasition-based models.
BERT (Bidirectional Encoder Representations from Transformers): (Responsible: Sam/Ash) Fine-tuned the bert-base-cased model over NER task, with following modifications:
- Aggregation Strategies: Improved aggregation method to make sure that each input word is not split into parts in the output.
- Masking: Masked by chance 15% of named entities in the training data and adjusted the leraning rate and number of epochs for the fine-tuning process to test if encouraging the utilization of nearby context could improve the generalizability over unseen named-entities.
Gemma 3 (Decoder LLM): (Responsible: Square) Tested the performance by using decoder LLMs on NER tasks, guided using prompt engineering. Ensured JSON output. Evaluated on both zero-shot (only instructions) and few-shot (instructions plus examples) cases.

For more experimental data, please see the report for more information.

Project: Event-Planning App "Oasis"

June 7, 2024 · 2 min read

Android event planning app written in Java, with the following features:

Alarm-setting and Event Organization with SQL via RoomDatabase
Allows the user to add (1) a large number of dates to each task, and (2) switch to the "same day-of-week" view for simple date-picking especially for recurring events
Notification System

Project: P2P Communication App

April 8, 2024 · 5 min read

A fork of the project in the semester 2023-24 Term 2, creating a peer-to-peer communication app supporting audio recording, waveform display and editing, and also screen share function.

Key Features

Peer-to-peer communication within local area network
Functions:
- Create, join and leave chat rooms
- Audio recording
- Screen sharing
Synchronization and handling of audio and video streams from multiple users so that they do not hear their own voices

Project: Vision Transformer Analysis

April 1, 2024 · 4 min read

This is the final project for the course AIST4010. More details on the project can be found in the report. This project is done in April 2024.

Report:

Overview

Project Goals

The project investigates the generalizability of Vision Transformers (ViTs) compared to Convolutional Neural Networks (CNNs) for small-scale computer vision tasks. While ViTs excel in large datasets, they struggle with smaller ones. This work evaluates and compares the performance of models like ResNet, ViT, DeiT, and T2T-ViT on classification tasks using small subsets of CIFAR-10 and STL-10 datasets.

Key Contributions

Scalability Analysis: Demonstrated performance degradation of ViTs with reduced dataset sizes, showing CNNs are more effective for small datasets.
Computational Efficiency: Analyzed training iterations and time-to-convergence, highlighting that ViTs, while converging faster, still lack efficiency due to lower accuracy on small datasets.
Comparison of Architectures: Implemented and trained models with similar parameter counts for fair performance evaluations.

Project: ARG Prediction with Transformers

March 1, 2024 · 2 min read

Fine-tuned ProtTrans model for antibiotic resistance gene classification achieving 0.94 F-score.

Key Achievements

0.94 F-score on ARG classification
Fine-tuned ProtTrans model
Robust bioinformatics pipeline

Project: U-Net Segmentation

August 4, 2023 · 4 min read

Backup of Old Project (December 2023)

This is a backup of an old project focused on training a U-Net model from scratch for semantic segmentation from scratch on the Cityscapes dataset and Carvana dataset. The images are DOWNSCALED to speed up the training process for learning purposes.

Project: YOLO Object Tracking

June 5, 2023 · 3 min read

Overview

This is a backup of an old project that focused on object detection and tracking over videos using YOLOv8.

Applied Abrewley's Sort Library and self-implemented instance label assigning function based on maximum overlap to stablize the labelled id on each car moving across frames.

Project: Deep Q-Learning Agent

December 1, 2022 · 2 min read

Overview

Created a Gym environment of a simple third-person shooter game in Python
Implemented a simple Deep-Q Network with PyTorch to train agents to master at the game (left image)
Fine-tuned the hyperparameters of the agent, achieving average kill streak of 7 (right image, top) and lengthend the survival duration by 4 times (right image, bottom), which significantly better than the random baseline of 0.22 kills on average.
Explored how deep-Q learning models handle a variable quantity of moving objects, i.e. the bullets and enemies, and relevant adjustments to the reward functions and representations of the observation space needed.

Report:

Project: GAN Generation

August 1, 2022 · 2 min read

Backup of GAN Learning Project (August 2022)

[!NOTE] The project explores various GAN architectures and improvements through iterative versions.

[!IMPORTANT] This project is a personal learning exercise in understanding and implementing different GAN techniques.

This project re-implemented GAN, WGAN and conditional GAN and explored the typical problems that occurred with GAN-based architectures like mode collapse and sensitivity to hyperparameters.

Overview​

Scope and Applications​

Key Areas Explored​

Static-Pose Representation Learning​

Extension to Dynamic Gesture Recognition​

Overview​

Methodology​

Key Features​

Overview​

Project Goals​

Key Contributions​

Key Achievements​

Backup of Old Project (December 2023)​

Overview​

Overview​

Backup of GAN Learning Project (August 2022)​

Overview

Scope and Applications

Key Areas Explored

Static-Pose Representation Learning

Extension to Dynamic Gesture Recognition

Overview

Methodology

Key Features

Overview

Project Goals

Key Contributions

Key Achievements

Backup of Old Project (December 2023)

Overview

Overview

Backup of GAN Learning Project (August 2022)