Fine-Tuning Llama 3.2 for BRD Extraction

Fine-Tuning Llama 3.2 for BRD Extraction

A production-ready system that fine-tunes Llama 3.2 1B to extract structured project estimations from Business Requirements Documents using QLoRA and synthetic data generation.

Machine LearningLLMFine-TuningNLPPythonQLoRAPyTorch

The Challenge

In enterprise project management, Business Requirements Documents (BRDs) contain critical project estimation data - effort hours, timelines, and costs - buried within lengthy unstructured text. Manually extracting this information is time-consuming and error-prone, making it difficult to analyze project patterns or build predictive models at scale.

The challenge was to build an intelligent extraction system that could:

  • Parse unstructured BRD documents with varying formats
  • Extract three key fields: Effort Hours, Timeline (Weeks), and Cost (USD)
  • Generate validated, structured JSON outputs
  • Run efficiently on consumer hardware without GPU requirements

The Approach

I built a specialized extraction system by fine-tuning Meta's Llama 3.2 1B model using state-of-the-art parameter-efficient techniques. The approach consisted of four key phases:

1. Synthetic Data Generation

Since no public BRD dataset exists, I generated 1,200+ synthetic training examples using Claude/GPT-4, covering diverse industries and project types. Each document included ground-truth labels for supervised learning, ensuring full control over data quality and balance.

2. Parameter-Efficient Fine-Tuning

I leveraged QLoRA (Quantized Low-Rank Adaptation) to make training feasible on CPU:

  • 8-bit quantization reduced memory footprint by 75%
  • LoRA adapters trained only ~0.5-2% of model parameters
  • Gradient accumulation achieved effective batch size of 32
  • Training completed in 12-24 hours on Intel MacBook Pro (or 1-2 hours on free Colab T4 GPU)

3. Type-Safe Output Validation

Integrated Pydantic schemas for production-ready extraction:

  • Automatic type coercion and validation
  • Custom business logic validators
  • Clear error handling for malformed outputs
  • Reliable data contracts for downstream systems

4. Comprehensive Evaluation Pipeline

Built multi-metric evaluation framework comparing base vs fine-tuned models:

  • Exact match accuracy
  • Field-level MAE, RMSE, and R² scores
  • Error analysis and visualization
  • A/B testing on unseen examples

Technical Implementation

Core Technologies:

  • Model: Meta Llama 3.2 1B (via Hugging Face Transformers)
  • Fine-Tuning: QLoRA with PEFT, TRL, and bitsandbytes
  • Validation: Pydantic 2.x with custom validators
  • Framework: PyTorch with Accelerate
  • Data: Synthetic generation with Claude API
  • Interface: Interactive Gradio demo

Key Training Parameters:

LoRA Rank: 8
LoRA Alpha: 16
Learning Rate: 2e-4
Batch Size: 1 (gradient accumulation: 32)
Epochs: 3
Max Sequence Length: 2048
Quantization: 8-bit

The project includes 7 structured Jupyter notebooks covering the complete pipeline from setup through deployment, with production-ready code and comprehensive documentation.

Results & Learnings

Performance Metrics

The fine-tuned model dramatically outperformed the base model:

Metric Base Model Fine-Tuned Improvement
Valid JSON Rate ~30% ~95% +65%
Exact Match ~5% ~75% +70%
Field Accuracy (±10%) ~40% ~85% +45%

Field-level precision showed strong predictive power:

  • Effort Hours: MAE ~45 hrs, R² 0.92
  • Timeline: MAE ~1.2 weeks, R² 0.88
  • Cost: MAE ~$5,500, R² 0.90

Key Learnings

  1. CPU Fine-Tuning is Viable: QLoRA enables training billion-parameter models on consumer hardware, democratizing LLM customization

  2. Parameter Efficiency Works: Training only 0.5-2% of parameters achieved 70%+ accuracy improvements while keeping adapter weights under 50MB

  3. Synthetic Data Solves Cold Start: High-quality synthetic data generation overcomes the lack of specialized training datasets

  4. Type Safety is Critical: Pydantic validation transforms unreliable LLM outputs into production-ready structured data

  5. Optimization Matters: Techniques like gradient checkpointing, 8-bit quantization, and gradient accumulation make the difference between feasible and infeasible training

Future Extensions

This approach generalizes to other structured extraction tasks including contract analysis, invoice processing, resume parsing, and medical record extraction. Potential improvements include grammar-constrained generation, confidence scoring, and reasoning explanations for extractions.

Impact

This project demonstrates end-to-end MLOps proficiency including model selection, efficient fine-tuning, evaluation methodology, and production deployment. It showcases practical application of cutting-edge techniques (QLoRA, LoRA, quantization) to solve real business problems with limited computational resources.