Skip to main content
  1. Data Science Blog/

Ollama Setup and Running Models

·1753 words·9 mins· loading · ·
AI and NLP Ollama Models Ollama Large Language Models Local Models Cost Effective AI Models

On This Page

Table of Contents
Share with :

Ollama Setup and Running Models

Ollama: Running Large Language Models Locally
#

The landscape of Artificial Intelligence (AI) and Large Language Models (LLMs) has traditionally been dominated by cloud-based services. While powerful, these often come with costs, privacy concerns, and require constant internet connectivity. Ollama emerges as a compelling open-source solution, designed to simplify the process of downloading, managing, and running LLMs directly on your local machine. This approach offers significant advantages, including enhanced privacy, cost savings, offline capability, and greater control over the models you use.

Why Choose Local LLMs with Ollama?
#

Running LLMs locally addresses several key challenges associated with cloud services:

  1. Privacy and Security: When using local models via Ollama, your data doesn’t need to leave your machine. This is crucial for handling sensitive information or for applications in sectors like healthcare and finance where data privacy is paramount.
  2. Cost Efficiency: Cloud-based LLM services often involve ongoing costs related to API calls or server usage. Ollama eliminates these costs, as you leverage your own hardware resources. Once a model is downloaded, running it incurs no additional expense.
  3. Reduced Latency: Local execution significantly reduces the network latency inherent in communicating with remote servers. This results in faster response times, which is beneficial for interactive applications.
  4. Offline Capability: Since the models run on your machine, you can use them even without an active internet connection (after the initial download).
  5. Customization and Flexibility: Ollama provides greater flexibility in customizing and fine-tuning models to suit specific needs, without the limitations imposed by third-party service providers.
  6. Accessibility: It simplifies the technically challenging process of setting up LLMs, making advanced language processing accessible to a broader audience, including developers, researchers, and hobbyists, without deep knowledge of machine learning frameworks or complex hardware configurations.

Getting Started with Ollama
#

Installation:

Setting up Ollama is straightforward:

  1. Navigate to the official Ollama website (ollama.com).
  2. Click the “Download” button.
  3. Select your operating system (macOS, Windows, or Linux).
    • macOS/Windows: Download the installer application and run it. Follow the on-screen prompts. The application will set up the necessary command-line tools and potentially start a background service.
    • Linux: Copy the provided curl command and execute it in your terminal to install Ollama.
  4. Verification: Once installed, open your terminal or command prompt and type ollama. If the installation was successful, you should see a list of available commands and options.

The Ollama application often runs as a background service, managing the models and handling requests. On macOS and Windows, you might see an Ollama icon in your system tray or menu bar.

Core Concepts and Usage
#

1. Running Models:

The primary command to interact with models is ollama run.

ollama run <model_name>

Replace <model_name> with the identifier of the model you wish to use (e.g., llama3.1, mistral, codegemma, llava).

  • If the specified model is not already present on your system, Ollama will automatically download it first. Model sizes can vary significantly (from a few gigabytes to hundreds), so ensure you have sufficient disk space and a stable internet connection for the download.
  • Once the model is ready (either downloaded or already local), Ollama will launch an interactive chat prompt in your terminal, allowing you to start conversing with the LLM immediately.

Example: ollama run mistral

You can exit the interactive chat prompt by typing /bye.

2. Model Management:

  • Listing Installed Models: To see which models you have downloaded locally, use:

    ollama list
    

    This command displays the model name, ID, size, and modification date.

  • Removing Models: If you need to free up disk space or no longer need a specific model, use:

    ollama rm <model_name>
    

    This will delete the specified model and its associated data from your system.

  • Pulling Models: You can download models without immediately running them using:

    ollama pull <model_name>
    

3. Understanding Models:

Ollama provides access to a wide variety of open-source models. When choosing a model, consider these factors:

  • Parameters: Often denoted with ‘B’ (billions), like 7B, 13B, 70B, or even 405B. This reflects the model’s complexity and capacity. More parameters generally mean better performance but require more computational resources (RAM and processing power).
  • Size: The disk space required to store the model. This is directly related to the number of parameters and quantization.
  • RAM Requirements: Running a model requires loading it into your computer’s RAM. Ollama’s documentation often provides guidance on how much RAM is needed based on the model’s parameter count (e.g., a 7B model might need 8GB+ RAM, while a 70B model could require 64GB+ RAM).
  • Quantization: A technique used to reduce the model’s size and computational requirements by reducing the precision of its weights (e.g., 4-bit quantization). This makes larger models feasible to run on consumer hardware, sometimes with a slight trade-off in performance.
  • Model Types: Ollama supports various model types tailored for different tasks:
    • Language Models: For text generation, conversation, instruction following, summarization (e.g., Llama series, Mistral, Gemma).
    • Multimodal Models: Capable of processing multiple types of input, such as text and images (e.g., Llava). You can provide an image file path along with your text prompt.
    • Embedding Models: Used to convert text into numerical vector representations, essential for Retrieval-Augmented Generation (RAG) systems and semantic search (e.g., nomic-embed-text, mxbai-embed-large).
    • Tool Calling Models: Fine-tuned models designed to interact with external tools, functions, or APIs in an agentic manner.

4. Finding Models:

The Ollama website features a model library (ollama.com/library) where you can browse, search, and filter available models. Each model page provides details about its size, parameters, use cases, and how to run it. Common popular choices include models from the Llama series, Mistral, CodeGemma (for coding tasks), and Llava (for multimodal tasks).

Advanced Usage
#

1. Customizing Models with Modelfile:
#

Similar to how Docker uses a Dockerfile to define container images, Ollama uses a Modelfile to create customized model variations. This plain text file allows you to:

  • Start from a base model (FROM <base_model_name>).
  • Set parameters like temperature (controls creativity vs. factuality), top_k, top_p, etc.
  • Define a SYSTEM prompt to give the model specific instructions, persona, or context for its responses.
  • Include adapter weights (e.g., for LoRA fine-tuning).

Example Modelfile:

FROM llama3.1:8b
PARAMETER temperature 0.7
PARAMETER top_k 50
SYSTEM """
You are a helpful assistant specializing in explaining complex scientific concepts in simple terms.
Always be concise and clear.
"""

To create a new custom model from this file:

  1. Save the content above into a file named Modelfile (no extension).
  2. Run the command in your terminal, in the same directory as the file:
    ollama create <your_custom_model_name> -f Modelfile
    
  3. Run your custom model:
    ollama run <your_custom_model_name>
    

2. The Ollama Server and REST API:
#

Under the hood, Ollama runs a local HTTP server, typically on http://localhost:11434. This server exposes a REST API that handles requests to the LLMs. This is fundamental because it allows any application capable of making HTTP requests to interact with your local models.

  • Automatic Start: Usually, the server starts automatically when the Ollama desktop application is running or when you use commands like ollama run.
  • Manual Start: You can manually start the server and view logs using:
    ollama serve
    
    This will show incoming requests and processing details in your terminal.
  • API Endpoints: The API provides various endpoints:
    • /api/generate: For straightforward text generation based on a prompt.
    • /api/chat: For conversational interactions, maintaining context through a list of messages.
    • Other endpoints exist for managing models (listing, pulling, deleting), showing model info, and creating embeddings.

You can interact with this API using tools like curl, Postman, or directly from your code. Common parameters in API requests include model, prompt (for generate), messages (for chat), stream (true/false - whether to stream response tokens or wait for the full response), and format (json - to request JSON output).

3. Interacting via Code (Python Example):
#

The Ollama API makes it easy to integrate local LLMs into your applications. Here’s how you might do it in Python:

  • Manual HTTP Requests: Using libraries like requests:

    import requests
    import json
    
    url = "http://localhost:11434/api/chat"
    payload = {
        "model": "mistral",
        "messages": [
            {"role": "user", "content": "Why is the sky blue?"}
        ],
        "stream": False # Get the full response at once
    }
    
    response = requests.post(url, json=payload)
    response.raise_for_status() # Raise an exception for bad status codes
    
    data = response.json()
    print(data['message']['content'])
    
  • Using the Official ollama Python Package: Ollama provides convenient libraries for popular languages. For Python:

    1. Install the package: pip install ollama
    2. Use the client:
      import ollama
      
      client = ollama.Client() # Connects to http://localhost:11434 by default
      
      response = client.chat(model='mistral', messages=[
        {
          'role': 'user',
          'content': 'Why is the sky blue?',
        },
      ])
      
      print(response['message']['content'])
      
      # For streaming responses:
      # stream = client.chat(
      #     model='mistral',
      #     messages=[{'role': 'user', 'content': 'Tell me a short story'}],
      #     stream=True,
      # )
      # for chunk in stream:
      #   print(chunk['message']['content'], end='', flush=True)
      

    The library handles the complexities of API calls, making integration much cleaner. Similar libraries exist for JavaScript/TypeScript.

4. Using Graphical User Interfaces (GUIs):
#

Because Ollama exposes a standard API, various community-developed GUI applications can act as frontends. Tools like “Open Web UI” or “Mist” provide chat interfaces similar to commercial offerings but connect to your local Ollama models. Some even offer features for managing models, adjusting parameters, and setting up simple RAG pipelines by uploading documents directly through the UI.

Common Use Cases
#

Ollama empowers a variety of applications:

  • Development and Testing: Easily experiment with different LLMs for application features without incurring API costs or dealing with complex setups.
  • Education and Research: Provides an accessible platform for learning about and experimenting with LLMs without the cost barriers of cloud services.
  • Secure Applications: Build AI-powered features for applications handling sensitive data, ensuring data stays within a controlled environment.
  • Offline AI Tools: Create tools that leverage LLMs even without internet access.
  • Personalized Assistants: Customize models with specific instructions or knowledge using Modelfile.
  • Building Local AI Applications: Create tools for tasks like:
    • Text summarization
    • Sentiment analysis
    • Code generation and explanation
    • Retrieval-Augmented Generation (RAG) systems using local embedding models and vector stores.

Tree View - Everything about Ollama
#

{% assign tree_data = site.data.ollama-tree %} {% include tree-view.html nodes=tree_data depth=0 %}

Conclusion
#

Ollama significantly lowers the barrier to entry for working with powerful Large Language Models. By enabling local execution, it addresses key concerns around cost, privacy, and complexity. Its simple CLI, standardized API, support for model customization, and compatibility with a growing ecosystem of open-source models make it an invaluable tool for developers, researchers, and AI enthusiasts looking to harness the power of LLMs on their own terms and hardware. Whether you’re building sophisticated AI applications or simply exploring the capabilities of modern AI, Ollama provides a robust, free, and private foundation.

Dr. Hari Thapliyaal's avatar

Dr. Hari Thapliyaal

Dr. Hari Thapliyal is a seasoned professional and prolific blogger with a multifaceted background that spans the realms of Data Science, Project Management, and Advait-Vedanta Philosophy. Holding a Doctorate in AI/NLP from SSBM (Geneva, Switzerland), Hari has earned Master's degrees in Computers, Business Management, Data Science, and Economics, reflecting his dedication to continuous learning and a diverse skill set. With over three decades of experience in management and leadership, Hari has proven expertise in training, consulting, and coaching within the technology sector. His extensive 16+ years in all phases of software product development are complemented by a decade-long focus on course design, training, coaching, and consulting in Project Management. In the dynamic field of Data Science, Hari stands out with more than three years of hands-on experience in software development, training course development, training, and mentoring professionals. His areas of specialization include Data Science, AI, Computer Vision, NLP, complex machine learning algorithms, statistical modeling, pattern identification, and extraction of valuable insights. Hari's professional journey showcases his diverse experience in planning and executing multiple types of projects. He excels in driving stakeholders to identify and resolve business problems, consistently delivering excellent results. Beyond the professional sphere, Hari finds solace in long meditation, often seeking secluded places or immersing himself in the embrace of nature.

Comments:

Share with :

Related

What is a Digital Twin?
·805 words·4 mins· loading
Industry Applications Technology Trends & Future Computer Vision (CV) Digital Twin Internet of Things (IoT) Manufacturing Technology Artificial Intelligence (AI) Graphics
What is a digital twin? # A digital twin is a virtual representation of a real-world entity or …
Frequencies in Time and Space: Understanding Nyquist Theorem & its Applications
·4103 words·20 mins· loading
Data Analysis & Visualization Computer Vision (CV) Mathematics Signal Processing Space Exploration Statistics
Applications of Nyquists theorem # Can the Nyquist-Shannon sampling theorem applies to light …
The Real Story of Nyquist, Shannon, and the Science of Sampling
·1146 words·6 mins· loading
Technology Trends & Future Interdisciplinary Topics Signal Processing Remove Statistics Technology Concepts
The Story of Nyquist, Shannon, and the Science of Sampling # In the early days of the 20th century, …
BitNet b1.58-2B4T: Revolutionary Binary Neural Network for Efficient AI
·2637 words·13 mins· loading
AI/ML Models Artificial Intelligence (AI) AI Hardware & Infrastructure Neural Network Architectures AI Model Optimization Language Models (LLMs) Business Concepts Data Privacy Remove
Archive Paper Link BitNet b1.58-2B4T: The Future of Efficient AI Processing # A History of 1 bit …