A comprehensive walkthrough of the 4 interactive web apps.
We built 4 standalone, production-ready web applications deployed on Hugging Face Spaces. Each app isolates a specific component of the modern Large Language Model stack:
A web application built with Flask and the `transformers` library, utilizing the native `unsloth/Llama-3.2-3B-Instruct` tokenizer.
Type any text to see real-time BPE chunking and integer mapping.
Launch Tokenizer SpaceA neural network only understands floating-point numbers. To process text, we must compress all human language into a finite dictionary of integers. If the dictionary is too small (characters), the model lacks context. If it's too large (whole words), the matrix becomes impossibly massive.
The solution is Sub-Word Tokenization: Breaking text into optimal linguistic chunks.
BPE finds the most frequent combinations of characters and merges them into single tokens. Common words become one token, while rare words are broken down into syllables.
Once the text is converted to a sequence of integers (e.g., `[45812, 1204]`), the model looks up each integer in a massive Embedding Matrix.
This converts the discrete integers into high-dimensional continuous vectors, allowing the model to perform mathematical operations on the meaning of words.
An interactive dashboard that visually renders probability distributions using a custom KenLM API endpoint.
Features the ₹30 Challenge: Can you perfectly slide the temperature to balance logic and creativity to hit exactly a 15% probability?
Launch Temperature SpaceAt the very end of the network, the model outputs raw scores (logits) for every single token in its vocabulary. Because these are unbounded raw numbers, they don't add up to 100%. We need a way to reliably convert them into a probability distribution.
This is done via the Softmax function.
The standard Softmax formula exponentiates the logits and divides by the sum. This forces the outputs to be between 0 and 1, creating a valid probability distribution.
By injecting a Temperature parameter ($\theta$), we divide the raw logits before they are exponentiated. This allows us to mathematically control the model's confidence.
Sharpened distribution. The highest logit dominates completely. Deterministic and "greedy".
Flattened distribution. Lower scores become viable. Highly creative and diverse.
A web app showcasing 3 levels of data extraction complexity, powered by the `Groq` API.
Try pasting unstructured text and watch the LLM perfectly populate a complex nested JSON schema without hallucinations.
Launch Structured Output SpaceTraditional software engineering relies on strict data structures (APIs, Databases). If an LLM returns a conversational response like "Sure, the error code is 500!", the software will crash.
We must constrain the LLM's generation so that it only outputs valid, machine-readable syntax.
During decoding, we can forcefully set the probability of invalid tokens to 0%. If the schema requires a boolean, the model is physically prevented from outputting anything other than `true` or `false`.
{
"error_code": 500,
"affected_services": [
"Database",
"Auth"
]
}
An interactive VRAM calculator paired with a live "Sticky Note" injection simulator.
Write a custom, hallucinated fact on the Sticky Note, and watch the frozen Llama-3 instantly adapt to it in real-time!
Launch LoRA SpaceTo teach a model new facts, you must update its brain ($W$). A standard 3 Billion parameter model requires roughly 6GB of VRAM just to load.
However, running backpropagation requires storing optimizer states (Adam), gradients, and activations. A full fine-tune of a 3B model requires upwards of 30 GB of VRAM, putting it out of reach for consumer hardware.
Instead of modifying the massive 3B parameter brain, LoRA freezes it entirely. We then append two tiny matrices ($A$ and $B$) that act as "sticky notes". The new knowledge is mathematically injected during the forward pass.
3 Billion Parameters.
Fixed in memory. No gradients required.
1.5 Million Parameters.
Trainable in under 2 GB of VRAM!
Thank you for attending the IEEE CVPR LLM Laboratory!