Sponsored by Notion AI. The one tool that does it all with ChatGPT-4 & Claude. Search, generate, analyze, and chat—right inside Notion.
Running an AI model on your own computer lets you experiment with powerful language tools privately and offline.
This guide will walk you through the basics of setting up a local AI installation and getting started with it.
No prior experience is required—just a bit of patience and a willingness to learn as you go!
What You’ll Need
A Computer or Laptop: You don’t need a supercomputer, but your hardware will determine how fast the AI runs and which models you can use. Minimum specs:
RAM: 8GB (16GB+ recommended for smoother performance).
Storage: At least 10-20GB free space (models vary in size).
GPU (Optional): A graphics card (e.g., NVIDIA with 4GB+ VRAM) speeds things up, but you can run smaller models on just a CPU.
Operating System: Windows, macOS, or Linux (Windows is easiest for beginners).
Internet Connection: Needed for downloading software and models initially, but not for using the AI once set up.
Step 1: Understand the Basics
Before diving in, here’s what you’re working with:
LLM (Large Language Model): This is the "brain" of your AI, like Llama or its variants. It’s what generates text responses.
Software: You’ll need a program to run the LLM. We’ll use simple options that don’t require coding.
Models: Pre-trained AI files you download. Smaller ones (e.g., 7 billion parameters, or "7B") work on modest hardware, while larger ones (e.g., 13B or 70B) need more power.
Don’t worry if this sounds complex—it’ll make sense once you start!
Step 2: Choose and Install Software
For beginners, the easiest way to get started is with a one-click solution. Here are two great options:
KoboldCPP (Windows Recommended):
Why: Super simple—just one executable file to run. Works well on Windows and can use your GPU if you have one.
How to Install:
Go to KoboldCPP’s GitHub page.
Scroll to "Releases," download the latest .exe file (e.g., koboldcpp.exe).
Save it to a folder on your computer (e.g., C:\LocalAI).
Ollama (Mac/Linux Alternative):
Why: Easy to use and popular across platforms, especially for macOS users with M1/M2 chips.
How to Install:
Visit Ollama’s website.
Download the installer for your OS and run it.
Open a terminal (Command Prompt on Windows, Terminal on Mac/Linux) to use it later.
Quick Tip: If you’re on Windows, start with KoboldCPP—it’s the fastest path to success. Mac users with M1/M2 chips should try Ollama.
Step 3: Download an AI Model
Now you need an LLM to run. Models come in different sizes (e.g., 7B, 13B) and "flavors" (e.g., fine-tuned for chat, coding, or general use). For beginners:
Recommended Model: "Llama 2 7B Chat" or "Orca Mini 7B".
Why: Small enough for most computers, tuned for conversation.
Where to Get It:
Visit Hugging Face (a hub for AI models).
Search for "TheBloke" (a trusted uploader) + your model name (e.g., "TheBloke/Llama-2-7B-Chat-GGUF").
Download a "GGUF" file (e.g., llama-2-7b-chat.Q4_0.gguf). These are optimized for local use.
Save it in the same folder as your software (e.g., C:\LocalAI).
Size Note: A 7B model is about 4-5GB. Check your hardware: 8GB RAM can handle 7B, 16GB+ can try 13B.
Step 4: Run Your Local AI
With KoboldCPP:
Double-click koboldcpp.exe.
A window pops up. Click "Browse" and select your model file (e.g., llama-2-7b-chat.Q4_0.gguf).
Click "Launch." Wait a minute as it loads (first time takes longer).
A web browser tab opens (e.g., http://localhost:5001). Type a question like "Hello, how are you?" and hit Enter!
With Ollama:
Open your terminal.
Type ollama run llama2 (or the model name you downloaded, if different). It’ll download automatically if not already present.
Once loaded, type a prompt (e.g., "What’s the weather like?") and press Enter.
First Test: Ask something simple like "Tell me a joke." If it responds, you’re in business!
Step 5: Play and Experiment
Congratulations—you’ve got a local AI running!
Here’s what to try:
Chat: Ask questions like "What’s the capital of France?" or "Write a short story."
Customize: With KoboldCPP, explore the web interface settings (e.g., "Presets" for response style). With Ollama, use the terminal for now.
Limits: Smaller models might struggle with complex tasks (e.g., math or long context). Upgrade to a bigger model if your hardware allows.
Troubleshooting Tips
Slow?: If it’s sluggish, your CPU might be overloaded. Lower the model size or get a GPU.
Crashes?: Ensure your model file matches the software (GGUF for KoboldCPP). Redownload if needed.
No GPU?: That’s fine! KoboldCPP and Ollama work on CPU, just slower.
Next Steps
Explore More Models: Google "reddit best 7b llm for chat" for recommendations (e.g., "CodeLlama" for coding).
Add Features: Try tools like SillyTavern (pairs with KoboldCPP) for a fancier chat interface.
Learn: Check YouTube for "local LLM tutorials" or revisit the Reddit guide for deeper details.
That’s it! You’re now running your own AI locally.
It’s like having a mini-ChatGPT at home—private, free, and yours to tweak.
Have fun experimenting, and don’t hesitate to ask online communities (like r/LocalLLaMA) if you get stuck.