Fine-Tuning Llama 3.2 for BRD Extraction

The Challenge

In enterprise project management, Business Requirements Documents (BRDs) contain critical project estimation data - effort hours, timelines, and costs - buried within lengthy unstructured text. Manually extracting this information is time-consuming and error-prone, making it difficult to analyze project patterns or build predictive models at scale.

The challenge was to build an intelligent extraction system that could:

Parse unstructured BRD documents with varying formats
Extract three key fields: Effort Hours, Timeline (Weeks), and Cost (USD)
Generate validated, structured JSON outputs
Run efficiently on consumer hardware without GPU requirements

The Approach

I built a specialized extraction system by fine-tuning Meta's Llama 3.2 1B model using state-of-the-art parameter-efficient techniques. The approach consisted of four key phases:

1. Synthetic Data Generation

Since no public BRD dataset exists, I generated 1,200+ synthetic training examples using Claude/GPT-4, covering diverse industries and project types. Each document included ground-truth labels for supervised learning, ensuring full control over data quality and balance.

2. Parameter-Efficient Fine-Tuning

I leveraged QLoRA (Quantized Low-Rank Adaptation) to make training feasible on CPU:

8-bit quantization reduced memory footprint by 75%
LoRA adapters trained only ~0.5-2% of model parameters
Gradient accumulation achieved effective batch size of 32
Training completed in 12-24 hours on Intel MacBook Pro (or 1-2 hours on free Colab T4 GPU)

3. Type-Safe Output Validation

Integrated Pydantic schemas for production-ready extraction:

Automatic type coercion and validation
Custom business logic validators
Clear error handling for malformed outputs
Reliable data contracts for downstream systems

4. Comprehensive Evaluation Pipeline

Built multi-metric evaluation framework comparing base vs fine-tuned models:

Exact match accuracy
Field-level MAE, RMSE, and R² scores
Error analysis and visualization
A/B testing on unseen examples

Technical Implementation

Core Technologies:

Model: Meta Llama 3.2 1B (via Hugging Face Transformers)
Fine-Tuning: QLoRA with PEFT, TRL, and bitsandbytes
Validation: Pydantic 2.x with custom validators
Framework: PyTorch with Accelerate
Data: Synthetic generation with Claude API
Interface: Interactive Gradio demo

Key Training Parameters:

LoRA Rank: 8
LoRA Alpha: 16
Learning Rate: 2e-4
Batch Size: 1 (gradient accumulation: 32)
Epochs: 3
Max Sequence Length: 2048
Quantization: 8-bit

The project includes 7 structured Jupyter notebooks covering the complete pipeline from setup through deployment, with production-ready code and comprehensive documentation.

Results & Learnings

Performance Metrics

The fine-tuned model dramatically outperformed the base model:

Metric	Base Model	Fine-Tuned	Improvement
Valid JSON Rate	~30%	~95%	+65%
Exact Match	~5%	~75%	+70%
Field Accuracy (±10%)	~40%	~85%	+45%

Field-level precision showed strong predictive power:

Effort Hours: MAE ~45 hrs, R² 0.92
Timeline: MAE ~1.2 weeks, R² 0.88
Cost: MAE ~$5,500, R² 0.90

Key Learnings

CPU Fine-Tuning is Viable: QLoRA enables training billion-parameter models on consumer hardware, democratizing LLM customization
Parameter Efficiency Works: Training only 0.5-2% of parameters achieved 70%+ accuracy improvements while keeping adapter weights under 50MB
Synthetic Data Solves Cold Start: High-quality synthetic data generation overcomes the lack of specialized training datasets
Type Safety is Critical: Pydantic validation transforms unreliable LLM outputs into production-ready structured data
Optimization Matters: Techniques like gradient checkpointing, 8-bit quantization, and gradient accumulation make the difference between feasible and infeasible training

Future Extensions

This approach generalizes to other structured extraction tasks including contract analysis, invoice processing, resume parsing, and medical record extraction. Potential improvements include grammar-constrained generation, confidence scoring, and reasoning explanations for extractions.

Impact

This project demonstrates end-to-end MLOps proficiency including model selection, efficient fine-tuning, evaluation methodology, and production deployment. It showcases practical application of cutting-edge techniques (QLoRA, LoRA, quantization) to solve real business problems with limited computational resources.