Generative AI for the Uninformed
(by the Uninformed)

Presented by Zackary Lowery
on August 21st, 2025 at Leading EDJE.

Available online at https://presentations.xcjs.com/

Press your space bar or swipe to navigate down and then forward. »

About Me and Why I'm Uninformed

  • I'm Zackary Lowery!
  • Currently at Leading EDJE Logo Leading EDJE for nearly 10 years!
  • Senior developer
  • C#/.NET, frontend web development, and just about anything else
  • Started playing with generative AI in April of 2024
Zackary Lowery Illustrated Portrait

Topics

Covered

  • High Level Concepts
  • Self-hosted Solutions
    "Edge" AI Processing

Not Covered

  • Low Level Concepts
  • Ethical Concerns
  • Commercial Services (ChatGPT, etc.)
OpenAI Logo

OpenAI

We have AI at home!

Generative AI at home:

...and more!

Hardware Requirements

Pixel Art Graphic of a High End PC and Gaming Laptop

Requirements by Use Case

CPU RAM VRAM
LLM AVX2 Support 8+ GB 0-8+ GB
Images * 16+ GB 4-12+ GB
Video * 16+ GB 12+ GB
Audio * 16+ GB 16+ GB

LLM
Large Language Model
AVX2
Advanced Vector Extensions 2, an instruction set supported by most mid to high end x86_64 processors since 2013. Windows on ARM Insider Build 27744 has added emulation support for this feature.

OS Matters!

  • Using a headless Linux install automatically grants you up to an additional 1 GB of VRAM when compared to Windows.
Linux Logo

GPU Manufacturer Matters!

  • CUDA Acceleration Widely Supported
  • ZLUDA?
NVIDIA Logo

CUDA
Compute Unified Device Architecture, NVIDIA's proprietary API for performing general purpose calculations on GPUs.
ZLUDA
A library for adapting CUDA to non-NVIDIA hardware.

Example Hardware Configuration

Motherboard Gigabyte B450M DS3H (Micro ATX)
CPU AMD Ryzen 5 1600 Six-Core Processor
RAM 64 GB DDR4 RAM
Storage 2 TB Intel 670p NVME SSD
GPU
  • NVIDIA RTX 3060 (12 GB)
  • NVIDIA RTX 2070 Super (8 GB)
    (Suspended by zip ties from the top of the case.)
Power Supply 750 Watts
My Personal AI Server

AI on a Budget

NVIDIA Jetson Orin Nano Super (~$549.99)

AI Performance 67 INT8 TOPS
GPU NVIDIA Ampere architecture with 1024 CUDA cores and 32 tensor cores
CPU 6-core Arm® Cortex®-A78AE v8.2 64-bit CPU 1.5MB L2 + 4MB L3
Memory 8GB 128-bit LPDDR5
102 GB/s
Storage Supports SD card slot and external NVMe
Power 7W-25W
NVIDIA Jetson Orin Nano Super

NVIDIA "Edge AI"

AI on a "Budget"

NVIDIA Project Digits ($3,000+)

AI Performance 1 FP4 PetaFLOP
GPU NVIDIA Blackwell (RTX 50 series) GPU
CPU 20-core ARM Grace CPU
Memory 128 GB LPDDR5X
Storage Up to 4 TB 2230 NVME SSD
Power ???W
NVIDIA Project Digits

FP4
A four-byte standard for storing floating point numbers defined in IEEE-754. (sign bit, 8-bit exponent, 23-bit significand)

Introductory Concepts

Generative AI is Big Statistics

It's statistics all the way down.

An image of it being turtles all the way down.

Determinism

Generative AI models are deterministic.

Equal inputs = Equal outputs.

What is a Vector?

A vector is an ordered list of numbers.

[10, 96, 72, 31, 98]

Or...

[489.1512, 82.582, 17.124, 857.1723, 15.234]

Vectors have dimension. These examples have 5 dimensions each.

What is a Model?

Models or weights are collections of vectors organized in various architectures alongside other features.

Name Description Extension
Pickle
  • Python serialization format
  • Tensors / numerical vectors
  • Executable VM bytecode
  • Unsafe / deprecated
.pickle / .pkl
Pytorch
  • Pytorch vector export
.pth
Safetensors
  • Similar to Pickle
  • No bytecode
  • Zero-copy + really fast
.safetensors
GGUF
(GGML Universal File)
(Generative Graphical ModeL)
  • Tensors + metadata
  • "Quantizable"
.gguf
ONNX
(Open Neural Network Exchange)
  • Model exchange format
  • Architecture/hardware-independent
  • Used for transferring models across formats or platforms
.onnx

More on ONNX

ONNX Logo

From ONNX.ai:

ONNX is an open format built to represent machine learning models.
  • Strong backing from many technical industry leaders

ONNX and Microsoft Logo Microsoft

Windows Task Manager Displaying an NPU chart.
NPU
Neural Processing Unit: A special coprocessor optimized to run machine learning models.

Quantization

Lowering the numerical precision of a model to positively impact size and/or performance.

Name Parameters Quantization Size (V)RAM
Fake-8B-q4_0.gguf 8 Billion 4-bit int 4.7 GB ~5.6 GB
Fake-8B-q8_0.gguf 8 Billion 8-bit int 8.5 GB ~10 GB

Lossless versions of models usually begin with FP16, FP32, or FP64 values before quantization.

Distillation

Using a larger model to train smaller models.

  • Distilled models are typically a similar probability distribution of fewer parameters.
Name Parameters Quantization Size (V)RAM
Fake-8B-q8_0.gguf 8 Billion 8-bit int 8.5 GB ~10 GB
Fake-70B-q8_0.gguf 70 Billion 8-bit int 75 GB ~90 GB

Large Language Models

Prompt

  • User-provided text that the LLM responds to.


							$ Tell me about the Roman Empire.

							Absolutely! The Roman Empire was...
						

System Prompt

  • System-configured prompt to provide instructions or context to subsequent prompts.
  • Prefixes or is combined with every user prompt
  • Generally not visible or adjustable by the end user.


							You are an assistant designed to help the user with
							any request or question. Responses should be
							accurate and concise. Refuse to answer any request
							or question that might be used to harm the user or
							another person.
						

Tokens

  • A word, part of a word, or even punctuation.
  • Determined by the LLM's tokenizer.
  • Each token is assigned and referenced by an integral ID.

LLM Tokens Visualized Image borrowed from TeeTracker@Medium

Context

  • A summary of the history of the conversation with the LLM or additional tokens provided by other sources.
  • Different LLMs can have varying maximum context windows.


							[113, 101, 102, 103, 104, 105, 106, 107, 108, 109, 104, 110, 111, 112, 114]
						

Temperature

How likely an LLM will select the "best" series of tokens in its response.

Cool 0.0 2.0 Warm

0.0

Not an entirely accurate representation as the LLM still selects tokens out of a distribution curve. The curve is just flattened.

Have a great !

day 0.8
life 1.0
dinner 1.2
holiday 1.4
paper cut 1.8
kangaroo 2.0

Mixture of Experts (MoE) Models

A model consisting of multiple networks that specialize in a subset of the total parameters.

Mixtral

Mistral Logo

Deepseek v3

Deepseek Logo

Llama 4

Meta Logo

Reasoning Models

  • Chain-of-Thought (CoT) output before prompt response.
  • Splits attention mechanism on the prompt into smaller chunks.

							User:
							What is 1 + 1?

							Assistant:
							
							  1 is the smallest positive integral value. By
							  adding 1 to 1, we get the next integer in the
							  sequence of all integers, which is 2. Therefore,
							  1 + 1 = 2.
							

							The answer is 2.
						
Deepseek Logo

Deepseek R1

Microsoft Logo

Phi 4 Reasoning

Mistral Logo

Magistral

Retrieval Augmented Generation (RAG)

  1. Convert external data to embeddings.
  2. Retrieve the embeddings.
  3. The embeddings are used to augment the user's prompt.
  4. The LLM generates a response to the prompt.

Embeddings

Embeddings are similar to tokenizing the data, but embeddings also preserve the relationship between tokens.

Image borrowed from Tokenization vs. Embedding: Understanding the Differences and Their Importance in NLP.

Model Context Protocol (MCP)

  • Standardized protocol to contextualize and provide external callable functionality to LLMs.
  • Based on JSON-RPC 2.0.
An AI-generated imaged of a lamp.

							User: Turn on the living room lights.

							Assistant: Ok, I turned on the living room lights.
						

1. MCP Server Lists Functions


							{
							  "jsonrpc": "2.0",
							  "id": 1,
							  "result": {
							    "tools": [
							    {
							      "name": "control_lighting",
							      "title": "Smart Home Control: Lighting",
							      "description": "Turn lights on or off in a room.",
							      "inputSchema": {
							        "type": "object",
							        "properties": {
							         "room": {
							           "type": "string",
							           "description": "The room to turn the lights on or off in.",
							          },
									  "enabled": {
										"type": "bool",
										"description": "Whether to turn the lights on or off."
									  }
							        },
							        "required": ["room"]
							      }
							    }
							  ]
							}						  }
						

2. MCP Host Calls a Function


							{
							  "jsonrpc": "2.0",
							  "id": 1,
							  "method": "tools/call",
							  "params": {
							    "name": "control_lighting",
							    "arguments": {
							      "room": "Living Room",
							      "enabled": true
							    }
							  }
							}
						

3. MCP Server Executes Function and Returns Result


							{
							  "jsonrpc": "2.0",
							  "id": 1,
							  "result": {
							    "room": "Living Room",
							    "enabled": true
							  }
							}
						

Notable LLM Projects and Resources

Hugging Face Logo Hugging Face

Online repository for various types of machine learning models.

Screen shot of the HuggingFace web site.

Llama.cpp Logo Llama.cpp

  • Inference engine for the Meta-developed LLaMA foundational LLM models
  • Open-source
  • CLI application, server, or standalone library, and more
  • Used in nearly all open-source/self-hosted LLM solutions
llama-cli in action, courtesy of the Qwen documentation.

Ollama Logo Ollama

  • "Just works" LLM runtime
  • Server manager for queueing requests across multiple models
  • OpenAI compatible API

Ollama Logo ollama pull

  • Download a large language model from Ollama's repository.
  • Model names based on Docker naming convention with tags - :latest is the default tag.


							$ ollama pull llama3:8b
						

Ollama Logo ollama run

  • Runs an interactive shell to prompt the LLM.
  • Pulls the model if missing.


							$ ollama run llama3:8b
						

Ollama Logo ollama serve

Runs an OpenAI compatible API to facilitate LLM interactions



							$ ollama serve
						

Open Web UI Logo Open Web UI

Web frontend for Ollama and other LLM backends

Open WebUI Demo Animation

Llamafile Logo Llamafile

Single-file multi-platform self-contained executable LLM with a CLI or web based frontend

AMD64 ARM64
Windows
Linux
Mac OS X
FreeBSD
OpenBSD
NetBSD

LM Studio Logo LM Studio

  • Desktop application
  • Probably the easiest way to get started with local LLMs
  • Supports starting an OpenAI compatible API
  • Closed source
LM Studio Screenshot

Jan Logo Jan

  • Desktop application similar to LM Studio
  • Another approachable way to get started with local LLMs
  • Supports starting an OpenAI compatible API
  • Open source
LM Studio Screenshot

Image Generation

Example AI-generated image of 'quantum wave'.

Generative Image Model Convolutions

Video borrowed from AI, But Simple.

Latent Space

A low-dimensional or set of low-dimensional vectors representing an image or image canvas


							[
							  0, 0, 0,
							  0, 0, 0,
							  0, 0, 0
							]
						

UNet Model

  • The main generation model, typically a CNN (convolutional neural network).
  • Predicts an image by iteratively removing noise in the latent image.
  • U-like shape of the image encoding and decoding process.

The UNet architecture. Don't ask me how it works yet. Borrowed from Wikipedia's U-Net article.

Prompt

A description of an image.


							A photo of an astronaut riding a horse.
						
AI-generated image of an astronaut riding a horse.

Text Encoder

Tokenizes the prompt into a vector.

LLM Tokens Visualized Image borrowed from TeeTracker@Medium

CLIP Dictionary

Contrastive Language-Image Pre-training

See also: Multimodal Embeddings - relates vectors from multiple parameter types.

Infographic showing a relationship between different types of media and a CLIP dictionary. Borrowed from Meta's ImageBind.

KSampler

  1. Add visual noise based on seed and noise strength.
  2. Load UNet model.
  3. Iterate in steps based on positive and negative guidance (prompts).
AI image generation seed noise.

Seed

An integer used to generate the starting noise pattern.

657835674309455
AI image generation seed noise.

Steps

An animation showing an AI-generated image at 1, 5, 10, 15, and 20 steps.

An image at 1, 5, 10, 15, and 20 steps.

Sampler

Samplers control the denoising process.

An astronaut riding a horse with the Euler sampler.

euler

An astronaut riding a horse with the Heun sampler.

heun

An astronaut riding a horse with the DPMPP SDE sampler.

dpmpp_sde

An astronaut riding a horse with the DPMPP 2M SDE sampler.

dpmpp_2m_sde

Scheduler

Schedulers control the level of noise at each step.

An astronaut riding a horse with the Simple scheduler.

simple

An astronaut riding a horse with the Heun scheduler.

karras

An astronaut riding a horse with the normal scheduler.

normal

An astronaut riding a horse with the SGM Uniform scheduler.

sgm_uniform

Guidance Scale

The priority level of the prompt.

An astronaut riding a horse with a guidance scale of 3.5.

3.5

An astronaut riding a horse with a guidance scale of 7.0.

7.0

An astronaut riding a horse with a guidance scale of 14.0.

14.0

An astronaut riding a horse with a guidance scale of 28.

28.0

Variational Auto Encoder (VAE)

Converts the latent image to a pixel bitmap.


									[
									  -3.609193, -6.2027211, -3.2266970,
									  9.5763211, 1.2311611, -2.2856392,
									  -5.266179, 2.9949563, -5.5448007,
									  etc...
									]
								

Stable Diffusion

Long-time open-source/open-weight image generation model from Stability AI.

Version Resolution Release Date
1.x 512x512 2022-08-22
2.x 1024x1024 2022-11-23
XL * 2023-07-18
3.0 1024x1024 - 2048x2048 2024-06-12
3.5 * 2024-10-22/29

Flux Dev Logo Flux

Open and closed source/weight models from Black Forest Labs that challenged Stability AI's Stable Diffusion 3.0 release.

Version Resolution Release Date
Flux.1 Pro/Dev/Schnell 1024x1024 2024-08-01
Flux.1 Kontext 1024x1024 2025-05-29

Notable Image Generation Projects and Resources

SD.Next Logo SD.Next

A great beginner-friendly image generation web application with support for the latest models.

SD.Next Screenshot

ComfyUI Logo ComfyUI

Advanced industry-standard node-based generative AI toolkit.

ComfyUI Screenshot

SwarmUI Logo SwarmUI

Assistive wrapper around ComfyUI.

SwarmUI Screenshot

Invoke Logo Invoke

  • Canvas
  • Textbox
  • Selection Tool

Paint with AI-driven inpainting.

Invoke UI

Video Generation

Example AI Video
Example AI Video
Example AI Video
Example AI Video
Example AI Video
Example AI Video

"Spacetime Latent Patches"

Adding dimensionality around space and time to latent images helps produce video.

Figure explaining latent patches.

Borrowed from OpenAI's Video Generation Models as World Simulators.

Mochi Logo Mochi

Early open source video model from Genmo.

Wan

More recent and more capable open source video model from Alibaba.

Audio Generation

Meta Logo Meta Audiocraft

AudioGen


									Whistling with wind blowing
								

MusicGen


									Pop dance track with catchy melodies,
									tropical percussion,
									and upbeat rhythms,
									perfect for the beach
								

ACE-Step Logo ACE-Step

Open source/weight foundational music model with official support for ComfyUI.

Official example output from ACE-Step.


							(Verse 1)
							🎵🎵🎵
							It's not just a dream, it's the start of a way,
							Building the future where music will play.
							A model that listens, a model that grows,
							ACE-Step is the rhythm the new world knows.

							(Pre-Chorus)
							No more limits, no more lines,
							A thousand songs in a million minds.
							We light the spark, we raise the sound,
							A new foundation breaking ground.

							(Chorus)
							ACE-Step, we take the leap,
							Into a world where the music speaks.
							Fast and free, we shape the skies,
							Creating songs that never die.
							ACE-Step — the beat goes on,
							Foundation strong, a brand-new dawn.
							🎵🎵🎵


							(Verse 2)
							🎵
							Not just end-to-end, but a canvas to paint,
							A million colors, no need to wait.
							For every artist, for every sound,
							ACE-Step lays the sacred ground.

							(Pre-Chorus)
							🎵
							No more limits, no more lines,
							A thousand songs in a million minds.
							We light the spark, we raise the sound,
							A new foundation breaking ground.

							(Chorus)
							🎵
							ACE-Step, we take the leap,
							Into a world where the music speaks.
							Fast and free, we shape the skies,
							Creating songs that never die.
							ACE-Step — the beat goes on,
							Foundation strong, a brand-new dawn.

							(Bridge)
							🎵
							From every beat to every rhyme,
							We build the tools for endless time.
							A step, a song, a dream to keep,
							ACE is the promise we will leap.

							(Final Chorus)
							🎵
							ACE-Step, we take the leap,
							Into a world where the music speaks.
							Fast and free, we shape the skies,
							Creating songs that never die.
							ACE-Step — the beat goes on,
							Foundation strong, a brand-new dawn.
							ACE-Step — the future's song.
						

3D Model Generation

Hunyuan3D 2.0

An open source/weight 3D model generation model by Tencent.

Rotating animation of example Hunyuan3D output.

End

Insert discussion and discourse here.

Return to the rest of the presentations.