Overview
Dracula AI Agent is a full-stack AI chatbot trained from scratch on Bram Stoker's Dracula. Originally prototyped as a Hugging Face model, it was expanded into a production-ready web service — custom model, REST API backend, and a themed chat frontend all running on a self-hosted Ubuntu server.
The system lets users have a real-time conversation with an AI that generates responses in the voice of Dracula. It demonstrates the full ML-to-production pipeline: training, inference serving, deployment, and observability — not just a notebook experiment.
Architecture
🧠
Model
- GPT-style transformer with token + positional embeddings
- Cross-entropy loss, AdamW optimizer
- Temperature / top-k sampling at inference
- Checkpoint save/load via
torch.save
📊
Data Pipeline
- tiktoken GPT-2 encoding
- Fixed-length context windows with stride
- PyTorch DataLoader for batching + shuffling
- Train / validation split
🧪
Training Loop
- Multi-epoch training with periodic validation
- Loss tracking + Matplotlib visualization
- Qualitative sampling between epochs
🚀
Backend API
- FastAPI
/generateendpoint - JSON request/response schema
- CORS enabled for browser clients
💬
Frontend (VampChat)
- HTML/CSS/JS chat UI
- POST requests to FastAPI for inference
- Session controls: new chat / clear history
🌐
Deployment
- Ubuntu server, NGINX reverse proxy
- NGINX serves static files + proxies API
- Basic logging for production visibility
Features
📚
Custom GPT Model
Transformer-style architecture trained end-to-end on the full Dracula text — not fine-tuned, built from scratch.
🔡
tiktoken Tokenization
GPT-2 compatible encoding for efficient token batching and reproducible vocabulary.
🔁
Full Training Pipeline
PyTorch training loop with checkpointing, evaluation passes, and qualitative generation — reproducible end-to-end.
📈
Loss Diagnostics
Real-time training/validation loss visualization via Matplotlib — easy to spot overfitting or underfitting.
🧛
Interactive Chat UI
VampChat frontend where users converse with a Dracula-voiced AI. Session management built in.
⚡
Real-Time Inference
Browser sends POST requests to FastAPI; responses are rendered as they arrive. No external API — model runs on the server.
Workflow
Prepare dataset
Place the cleaned Dracula text file into the
data/ directory. The pipeline handles tokenization automatically.Run training
Execute
python gpt_train.py. Training runs multi-epoch with periodic validation and loss logging to stdout.Monitor diagnostics
Matplotlib plots training and validation loss curves. Qualitative text samples are printed between epochs to judge output quality.
Save checkpoint
Model weights saved via
torch.save. Reload at any point for inference or continued training.Serve via FastAPI
Launch the API server. NGINX proxies incoming requests to FastAPI and serves the VampChat static frontend.
Chat in VampChat
Open the browser UI, type a prompt, and receive a Dracula-voiced response generated live from the model.
Tech Stack
🐍
PythonCore language
🔦
PyTorchDeep learning
🔡
tiktokenGPT-2 tokenization
🧠
Custom TransformerGPT-style architecture
🧪
AdamWOptimizer
📈
MatplotlibLoss visualization
🚀
FastAPIInference API
🌐
JS / HTML / CSSChat frontend
🐧
Ubuntu + NGINXProduction server
💾
torch.save/loadCheckpointing