One post tagged with "Vision Transformers"

Project: Vision Transformer Analysis

April 1, 2024 · 4 min read

This is the final project for the course AIST4010. More details on the project can be found in the report. This project is done in April 2024.

Report:

Overview

Project Goals

The project investigates the generalizability of Vision Transformers (ViTs) compared to Convolutional Neural Networks (CNNs) for small-scale computer vision tasks. While ViTs excel in large datasets, they struggle with smaller ones. This work evaluates and compares the performance of models like ResNet, ViT, DeiT, and T2T-ViT on classification tasks using small subsets of CIFAR-10 and STL-10 datasets.

Key Contributions

Scalability Analysis: Demonstrated performance degradation of ViTs with reduced dataset sizes, showing CNNs are more effective for small datasets.
Computational Efficiency: Analyzed training iterations and time-to-convergence, highlighting that ViTs, while converging faster, still lack efficiency due to lower accuracy on small datasets.
Comparison of Architectures: Implemented and trained models with similar parameter counts for fair performance evaluations.

Overview​

Project Goals​

Key Contributions​

Overview

Project Goals

Key Contributions