My experience training a local LLM (AI chatbot) on local data…

…has been disappointing, to say the least. I’ll run through what I’ve tried here.

My original goal was two fold:

Firstly, I wanted to be able to dump a whole bunch of PDFs of scientific papers into an LLM, and then be able to interrogate those PDFs via a chat interface. Primarily, I saw this as a way to try and keep up with the literature, as I add papers to my ‘to read’ folder far quicker than I read them.

My second goal was to load in documentation for open source software I use, in the hope that the chat interface would help me optimize my input scripts, or add in features I wasn’t sure how to add.

In both cases, I’ve been unable to really achieve anything useful. This may partly be down to the format – I gather PDFs are a less than ideal way to feed information to a computer, because they don’t necessarily have obvious read order (e.g. they might have columns, be split up with figures, contain page numbers etc). But let’s be honest, there’s a tonne of info out there that is in PDF format, so presumably people are working on making this work better.

Anyway, I tried three ways to feed my own information into local LLMs (or, perhaps SLMs given they are smaller subsets of big Large Language Models like ChatGPT), using RAG, or Retrieval-Augmented Generation.

Nvidia Chat with RTX

This is by far the simplest way to get setup, but does require an Nvidia GPU. It’s available for free in beta from Nvidia. The installer demands the GPU has 8gb of VRAM, which the A2000 in my Surface Laptop Studio does not (it only has 4GB), however, I could plug in my eGPU (3070 with 8GB VRAM), install Nvidia Chat, and then use it without the eGPU, so it’s not a hard block on usage.

It’s a big download, at around 30-40gb, and then you need to download additional language models. When you run it, you get some text in a terminal, then a local webpage opens in the default browser as the main input:

Here, I’m running the Mistral 7B model, and trying to pass it a PDF of the Lethe documentation.

Ollama

The next two methods both rely on ollama to provide the LLM. Ollama is super easy to use, and you can pick from a whole host of models, including the Mistral model used in the Nividia Chat above, but many, many more. There’s been a lot of buzz on social media and various blogs about both Command-R, and llama3, which are meant to be excellent for RAG, and just generally (the larger llama3 model is meant to be on par with ChatGPT4). However, the better models do tend to be larger, which means more disk space required, and more RAM/VRAM too. Command-r runs well on my work computer, which has an RTX 4090 and 64GB of RAM, but is unusably slow on my Surface Laptop Studio with 32GB. Llama3-7b runs fine on my laptop, but the larger model, 70B, stands no chance there.

Ollama and some python stuff

Here I used Ollama in combination with some python scripts I found at this website. However, despite only being written in December last year, I found it was already out of date and the scripts didn’t work, so I had to modify them a bit as so:

from pathlib import Path
import qdrant_client
from llama_index.core import VectorStoreIndex, ServiceContext, SimpleDirectoryReader
from llama_index.llms.ollama import Ollama
from llama_index.core import StorageContext
from llama_index.vector_stores.qdrant import QdrantVectorStore
# Loading the documents from the disk
documents = SimpleDirectoryReader("./doc").load_data()
# Initializing the vector store with Qdrant
client = qdrant_client.QdrantClient(path="./qdrant_data")
vector_store = QdrantVectorStore(client=client, collection_name="springboot")
storage_context = StorageContext.from_defaults(vector_store=vector_store)
# Initializing the Large Language Model (LLM) with Ollama
# The request_timeout may need to be adjusted depending on the system's performance capabilities
llm = Ollama(model="llama2-uncensored", request_timeout=120.0)
service_context = ServiceContext.from_defaults(llm=llm, embed_model="local")
# Creating the index, which includes embedding the documents into the vector store
index = VectorStoreIndex.from_documents(documents, service_context=service_context, storage_context=storage_context)
# Querying the index with a specific question
query_engine = index.as_query_engine()
# Start an interactive chat session
print("Welcome to the chat session with the specialized LLM. Type 'quit' to exit.")
while True:
    # Get the user's input
    user_input = input("User: ")
    # Check if the user wants to quit
    if user_input.lower() == "quit":
        break
    # Query the LLM with the user's input
    response = query_engine.query(user_input)
    # Print the LLM's response
    print("LLM: ")
    print(response)

And I also had to update the requirements file for pip:

llama-index
qdrant_client
pypdf
transformers
torch
llama-index-llms-ollama
llama-index-vector-stores-qdrant
llama-index-embeddings-huggingface

This will use whatever files are in the subfolder ‘doc’, and will use the llama3 model, in this case (but that can be whatever you want). It will also open a chat that continues, rather than embedding a single query in the script, as happens at the source webpage.

[note – during the course of doing this blog post and testing, the libraries all changed again, so I was unable to use the above on a new computer without further tweaking (which I couldn’t be bothered to do)]

Ollama with open-webui

Finally, there’s using Ollama with an interface called Open WebUI. Install ollama, and pull a model as you normally would. Then follow the instructions to setup and run a docker image. You then open a browser and navigate to a localhost address (so a site running on your own computer). In that interface, after signing up (all that data is still local), you can go to the ‘documents’ tab and add all kinds of files; PDFs, text files, etc. You can then refer directly to those files in your prompts.

Specific results

To make a kind of controlled test, I gave each method my 2021 Palaeontology paper on reconstructing fossilized foot motions, and asked it to summarise the paper, specifically with the prompt:

"Hi, Can you summarise the work by Falkingham et al."

I then followed up with:

"tell me about the methods in this paper"

For both Ollama methods, I used the LLM Llama3, because it’s good, and it’s fast. In Nvidia Chat I used Mistral.

Here’s the respective outputs:

Nvidia Chat with RTX:

Falkingham et al. (2020) conducted research on constructing and testing hypotheses of dinosaur foot motions from fossil evidence. They used a combination of neoichnological experiments with elephants, close-range photogrammetry for 3-D ichnology, and experimental work with tetrapod tracks in laboratory and field settings to study the morphology and behavior of dinosaur footprints. Their work focused on understanding the impact of sediment consistency on track and undertrack morphology, as well as the implications of ichnotaxonomy and interspecific variation in tracks and trackways from extant animals. The authors also discussed the importance of considering the context of the fossil evidence and the limitations of current methods in reconstructing dinosaur foot motions.
Reference files:
Falkingham et al - 2020 - Constructing and testing hypotheses of dinosaur foot motions from fossil.pdf

This is…. not great. We didn’t do anything with elephants, for a start.

And the follow up:

The paper "Constructing and testing hypotheses of dinosaur foot motions from fossil" discusses methods used to record and communicate 3D track data. The authors mention that recording and communicating 3D track data has been made difficult by the predominant 2D form of communication. They mention that methods such as laser scanning, structured light scanning, and photogrammetry have become more affordable and widespread and enable the recording of 3D morphology either in the field or in the museum. The authors also mention that these methods have had a significant and positive impact on palaeontology and specifically on the field of ichnology. They also mention that these methods have been used to record, disseminate and analyze 3D track morphology.
Reference files:
Falkingham et al - 2020 - Constructing and testing hypotheses of dinosaur foot motions from fossil.pdf

Also, very much not great. No mention at all of discrete element modelling, simulation, or basically any core aspect of the paper. Bummer.

Ollama, Llama3 + Python Script

User: Hi, Can you summarise the work by Falkingham et al
LLM:
The study by Falkingham et al. explores the construction and testing of hypotheses regarding dinosaur footprints. The researchers investigate how shared sub-surface foot kinematics in birds and other dinosaurs can shed light on a new dimension of fossil track diversity.

This is way better, there’s no hallucinations or inaccuracies in there, but it’s not very detailed.

Here’s the follow up response to asking about the methods:

User: tell me about the methods in this paper
LLM:
The paper "Fast Parallel Algorithms for Short-Range Molecular Dynamics" by S. J. Plimpton discusses various algorithms for short-range molecular dynamics simulations.

Clear hallucination, or at least it’s lost the context, so for fairness I tried again with ‘tell me about the methods in the paper by Falkingham et al’

User: tell me about the methods used by falkingham et al
LLM:
Constructing and testing hypotheses of dinosaur foot motions.
User: yeah, what were the methods used in that paper
LLM:
Fast Parallel Algorithms for Short-Range Molecular Dynamics.

So…. after a good start, garbage.

Ollama, llama3, and Open-webui:

The paper discusses a novel method for analyzing fossil tracks that preserve skin impressions and striations from which foot motions can be discerned. The authors describe a particularly well-preserved track volume, which they use to reconstruct the passage of a foot through the sediment. They identify six landmarks on the upper and lower surfaces of each specimen, which help define the foot's movement through the track volume.
The researchers then use photogrammetry to digitize the specimens and create a 3D model of the track volume. This allows them to visualize the morphologies exposed on the different surfaces and reconstruct the sequence of poses taken by the foot as it moved through the sediment.
The team also uses computer simulations, specifically the Discrete Element Method (DEM), to test their hypothesis of motion. They simulate the formation of a track resulting from foot anatomy and motion, using an open-source software called LIGGGHTS.
Overall, the work demonstrates a new approach to analyzing fossil tracks and reconstructing ancient animal movements.

And that, ladies and gentlemen, is significantly better – it’s accurate, it’s detailed, it’s overall really good.

The follow up:

The paper discusses a novel method for analyzing fossil tracks using digital elevation models (DEMs) and 3D reconstruction techniques. 
Firstly, the authors digitized the fossil specimens using photogrammetry, which involves taking multiple photos of the fossils from different angles and then combining them to create a 3D model. The resulting models were then aligned with each other to create a complete 3D representation of the track volume.
The authors also used manual alignment techniques in Autodesk Maya to ensure that the slabs fit together without intersecting, allowing them to reconstruct the track surface at different depths.
Furthermore, the paper presents a unique method for simulating the motion of the foot as it traversed the soft sediment. This was achieved by modeling the interaction between the foot and the substrate using a combination of physical and kinematic constraints. The simulation allowed the authors to create virtual laminations within the track volume, essentially creating layers based on starting positions of each particle.
The paper demonstrates how these methods can be used to reconstruct the motion of the foot as it created the fossil track, providing valuable insights into the behavior and ecology of ancient animals.

It got DEM wrong – it’s meant to be Discrete element Method, not Digital elevation model, though to be fair both things have the same acronym. After that though, it’s pretty accurate again, even if it does use US spelling for behaviour!

Microsoft co-pilot

As a means of comparison, I also visited the online version of my paper, and asked Microsoft Copilot, built into Edge, the same questions. So this is running in the cloud, rather than locally. It’s also not having to parse a PDF, which is notoriously difficult (page numbers, flow of text and images etc make it harder for machine parsing really), as it’s accessing the webpage version. I notice in the latest version of Microsoft Copilot, you can specify whether to use the page you are on, or the whole web, so I set it to use just the page:

Certainly! Here’s a summary of the key points from the work by Falkingham et al.:

-Fossil Footprints: The study focuses on reconstructing dinosaur foot motions from deep, penetrative Jurassic dinosaur tracks, which capture more motion than shallower impressions.

-Simulation Testing: They test hypotheses of dinosaur foot motions using the Discrete Element Method (DEM), simulating substrate particles in response to a penetrating foot model¹.

-Trackmaker Motion: The results suggest a large backward sweeping motion of the foot, indicating locomotion over extremely soft or possibly submerged sediment².

-Implications: This research enhances understanding of dinosaur locomotion and demonstrates the utility of simulations in testing hypotheses derived from fossil tracks³.

The study provides valuable insights into the locomotor mechanics of dinosaurs and the interpretation of their tracks. It also underscores the importance of considering tracks as three-dimensional structures to understand the motion of the trackmaker.
Reference: onlinelibrary.wiley.com

That is really good.

The follow up:

The methods described in the paper focus on reconstructing and testing hypotheses of dinosaur foot motions from fossil tracks using digitization and simulation¹. Here are the key methods used:
-Digitization: The researchers digitized Jurassic dinosaur tracks that had been naturally split into layers, allowing them to reconstruct the foot motions of dinosaurs that lived over 200 million years ago².
-Discrete Element Method (DEM): This simulation method was used to test the reconstructed foot motions³⁴. Individual particles of substrate were simulated in response to a penetrating foot model to see if the resulting virtual tracks matched the fossil tracks⁵.
-Hypothesis Testing: The reconstructed foot motions were considered hypotheses. The DEM simulations served to either support these hypotheses if the virtual tracks resembled the fossils or reject them if the tracks were significantly different.
-Comparative Data: The study also utilized comparative data from extant taxa (modern animals) to provide context for the formational processes of the tracks.
These methods combined allowed the researchers to gain insights into the locomotor mechanics of extinct animals by analyzing their fossilized footprints. The paper emphasizes the importance of considering tracks as three-dimensional structures resulting from the interaction between an animal’s foot and the substrate⁶.
Reference: onlinelibrary.wiley.com

Again, absolutely superb.

Conclusions

Of the three local ways of interacting with documents, open-webui with llama3 was clearly the best. It’s weird that open-webui did so much better than the python script, given they both use llama3, so presumably it’s to do with how the data is extracted and quantized from the PDFs.

However, Microsoft copilot was leagues ahead. But then it should be – it’s running GPT4-Turbo. Whether it’s worth setting up a local LLM is up to you. I could see having llama3 on hand for flights/trains etc really useful, but the caveat of course is these things burn through GPU usage, so battery is going to take a major hit. Probably better on an M-series Mac, and I’m very eagerly awaiting the new snapdragon ARM chips coming this summer – if Ollama can take advantage of the upcoming NPUs, then that might change the game.

My experience training a local LLM (AI chatbot) on local data…

Nvidia Chat with RTX

Ollama

Ollama and some python stuff

Ollama with open-webui

Specific results

Conclusions

Leave a comment Cancel reply

Archives

Archives

Top Posts & Pages

Nvidia Chat with RTX

Ollama

Ollama and some python stuff

Ollama with open-webui

Specific results

Conclusions

Share this:

Related

Leave a comment Cancel reply

Archives

Archives

Top Posts & Pages