Run the full model Deepseek-R1-0528 locally

Share

Photo by the author

Deepseek-R1-0528 This is the latest update of the R1 Deepseek reasoning model, which requires 715 GB of disk space, which makes it one of the largest available Open Source models. However, thanks to advanced quantization techniques with MiserableThe size of the model can be reduced to 162 GB, which means a reduction by 80%. This enables users to fully power the model with much lower hardware requirements, although with a slight compromise of performance.

In this tutorial:

Configure Ollam and open the Internet interface to launch the Deepseek-R1-0528 model locally.
Download and configure the 1.78-bit (IQ1_S) quantic version.
Start the model using both the GPU +CPU configuration and only the processor.

Step 0: Preliminary requirements

To launch a quantified version of IQ1_, your system must meet the following requirements:

GPU requirements: At least 1x 24 GB GPU (e.g. NVIDIA RTX 4090 or A6000) and 128 GB of RAM. Thanks to this configuration, you can expect a generation speed of about 5 tokens/second.

RAM requirements: To run the model, the model without a GPU requires a minimum of 64 GB of RAM, but the performance will be confined to 1 token/second.

Optimal configuration: To get the best performance (5+ tokens/second), you need at least 180 GB of united memory or a combination of RAM by 180 GB RAM + VRAM.

Storage: Make sure you have at least 200 GB of free disk space for the model and its dependencies.

Step 1: Install dependencies and Ollam

Update your system and install the required tools. Ollam is a lightweight server for local launching enormous language models. Install it in Ubuntu distribution using the following commands:

apt-get update
apt-get install pciutils -y
curl -fsSL https://ollama.com/install.sh | sh

Step 2: Download and run the model

Run the 1.78-bit (IQ1_S) quantic version (IQ1_S) of the Deepseek-R1-0528 model using the following command:

ollama serve &
ollama run hf.co/unsloth/DeepSeek-R1-0528-GGUF:TQ1_0

Run the full model Deepseek-R1-0528 locally

Step 3: Configure and run an open internet interface

Pull out the image of Docker Web Docker with miracle support. Launch an open container for an interface with GPU support and Ollam integration.

This is a command:

Start Open Internet Interface Server at the port of 8080
Turn on the acceleration of the GPU with --gpus all flag
Install the necessary data catalog (-v open-webui:/app/backend/data)

docker pull ghcr.io/open-webui/open-webui:cuda
docker run -d -p 9783:8080 -v open-webui:/app/backend/data --name open-webui ghcr.io/open-webui/open-webui:cuda

After starting the container, get access to the open interface interface interface interface in the browser at the address http://localhost:8080/.

Step 4: Starting Deepseek R1 0528 in open Webui

Choose hf.co/unsloth/DeepSeek-R1-0528-GGUF:TQ1_0 Model from the model menu.

If the Ollam server does not employ the correct employ of GPU, you can go to the processor. Although this will significantly reduce performance (about 1 token/second), it ensures that the model can still work.

# Kill any existing Ollama processes
pkill ollama 

# Clear GPU memory
sudo fuser -v /dev/nvidia* 

# Restart Ollama service
CUDA_VISIBLE_DEVICES="" ollama serve

After starting the model, you can interact via an open interface. However, it should be remembered that the speed will be confined to 1 token/second due to the lack of acceleration of the GPU.

Final thoughts

Starting even a quantized version was complex. You need a quick internet connection to download the model, and if the download fails, you need to restart the whole process from the very beginning. I also had many problems trying to start it on my graphics processor, because I was still receiving GGUF errors related to low VRAM. Despite trying a few common corrections of GPU errors, nothing worked, so I finally changed everything to the processor. Although it worked, generating a reaction that is far from ideal, it takes about 10 minutes.

I am sure that there are better solutions, maybe using llam.cpp, but trust me, it took me all day to start it.

Abid Ali Awan (@1abidaliawan) is a certified scientist who loves to build machine learning models. Currently, it focuses on creating content and writing technical blogs on machine learning and data learning technologies. ABID has a master’s degree in technology management and a bachelor’s title in the field of telecommunications engineering. His vision is to build AI with a neural network for students struggling with mental illness.

The AI Sckool

Categories

Run the full model Deepseek-R1-0528 locally

Step 0: Preliminary requirements

Step 1: Install dependencies and Ollam

Step 2: Download and run the model

Step 3: Configure and run an open internet interface

Step 4: Starting Deepseek R1 0528 in open Webui

Final thoughts

Standalone LLM in the real world: limitations, workarounds and strenuous lessons

How Elon Musk squeezed OpenAI: “They will want to kill me”

Solving the mole dilemma: A smarter way to challenge AI vision models

How artificial intelligence can support fight antibiotic resistance

Sanctioned Chinese artificial intelligence company SenseTime releases an image model built for speed

More News

Standalone LLM in the real world: limitations, workarounds and strenuous lessons

How artificial intelligence can support fight antibiotic resistance

Britain’s answer to Darpa wants to reprogram the human brain

Local transcription of the whisper sound

Standalone LLM in the real world: limitations, workarounds and strenuous lessons

How Elon Musk squeezed OpenAI: “They will want to kill me”

Solving the mole dilemma: A smarter way to challenge AI vision models