Skip to main content

Project: Named Entity Recognition

· 4 min read

View Project Final Report Project Description

Overview

This project aims to explore methodologies related to Named Entity Recognition.

Methodology

The project evaluated three distinct deep learning architectures for Named Entity Recognition (NER):

  • SpaCy: (Responsible: Square) Explored the key approaches used in the SpaCy library, including hash embeddings and trasition-based models.
  • BERT (Bidirectional Encoder Representations from Transformers): (Responsible: Sam/Ash) Fine-tuned the bert-base-cased model over NER task, with following modifications:
    • Aggregation Strategies: Improved aggregation method to make sure that each input word is not split into parts in the output.
    • Masking: Masked by chance 15% of named entities in the training data and adjusted the leraning rate and number of epochs for the fine-tuning process to test if encouraging the utilization of nearby context could improve the generalizability over unseen named-entities.
  • Gemma 3 (Decoder LLM): (Responsible: Square) Tested the performance by using decoder LLMs on NER tasks, guided using prompt engineering. Ensured JSON output. Evaluated on both zero-shot (only instructions) and few-shot (instructions plus examples) cases.

For more experimental data, please see the report for more information.

Job Allocation

  • Sam: BERT experiments
  • Square: SpaCy and Gemma 3 experiments
  • Quinson: Literature review and result analysis

Other References

Metric

F-β\text{F-}\beta score, reduces to F-1 score when β=1\beta=1: Fβ=(β2+1)×precision×recallβ2×precision+recallF_\beta=\frac{(\beta^2+1)\times\text{precision}\times\text{recall}}{\beta^2\times\text{precision}+\text{recall}}

where, precision and recalls are evaluated on whether the named entity is being identified (and matches exactly as the one in the data file).

Aggregation Methods

  • Custom Decoding (Implemented by Square): Token-wise merging of class labels.
    • Problems: Sometimes words are truncated.
    • Example:
      False Positives [('PER', 'Shane War')]
      False Negatives [('PER', 'Shane Warne')]
  • aggregation_strategy="max": Runs best among different aggregation strategies.
    • Problems: Since tokenization can be different, sometimes words that were in the same "token" in the input does not stay as the same token after Bert tokenization, which make aggregation fail to cover that.
    • Example:
      'Pau-Orthez' # Input token
      'Pau', 'Orthez' # Output ('-' is not classified as a part of the name)
  • v3 aggregation strategy:
    • Custom aggregation strategy in masked_bert_ner.py that ensures that each input token is treated as a whole.
  • Seqeval: Package for evaluation.

Results (*raw)

Metric: Precision/Recall/F1 Score

Aggregation Method Results

ModelPrecisionRecallF1
Simple0.58040.73500.6486
First0.83180.88900.8595
Average0.83680.87910.8574
Max0.83590.88170.8582
Custom0.91240.91890.9157

Masking Method Results

ModelPrecisionRecallF1
CoNLL-2003 (Test)
Distillbert-Cased fine-tune (no mask)0.89270.90320.8979
Bert-Base-Cased fine-tune (masked)0.89290.89480.8939
WikiAnn (Test) (Not trained on)
Distillbert-Cased fine-tune (no mask)0.46030.51340.4854
Bert-Base-Cased (masked)0.45770.50800.4816

CoNLL-2003 Results

ModelPrecisionRecallF1
Bert-Base-NER0.91240.91890.9157
Distillbert-cased fine-tuned (no mask)0.89270.90320.8979
SpaCy Medium0.66180.57580.6158
SpaCy Large0.68500.63170.6573
Gemma 3 27B (zero-shot)0.63610.74890.6879
Gemma 3 27B (few-shot)0.66890.76200.7125

WikiAnn Results

ModelPrecisionRecallF1
Bert-Base-NER0.47860.52870.5024
Distillbert-cased fine-tuned (no mask)0.46030.51340.4854
Spacy Medium0.40020.39050.3953
Spacy Large0.40510.39870.4019