Why we (don't) need export control

2025-02-01

_config.yml

Introduction - Dario Amodei’s perspective

Recently Dario Amodei, ex-OpenAI, co-founder and CEO of Anthropic, published a short blog post titled On DeepSeek and Export Control: the post contains a first section with several interesting considerations regarding AI models, scaling laws and shifting the curve, as well as a second section with an analysis of the two latest DeepSeek releases (DeepSeek V3 and R1). If the first two sections, apart from some considerations that might sound a little too subjective, are not far from what we’ve been hearing in the past days about DeepSeek from the most critic fringes of experts, the third section, in which Amodei dives deep into the reasons why we should implement export control on chips against China, is not only controversial, but also misses some important points related to DeepSeek and, in general, to the whole open source ecosystem.

In this brief post, I would like to address some of the most important points of Anthropic’s CEO line of argument, reporting my thoughts and some considerations that I deemed important to the matter.

Theses and Antitheses

In this section, I will proceed like this: I will take a claim made by Amodei in his post, verbatim , and I’ll report my point of view on that. Each thesis-antithesis couple is separated by a line

  • “To the extent that US labs haven’t already discovered them, the efficiency innovations DeepSeek developed will soon be applied by both US and Chinese labs to train multi-billion dollar models.”

This is the main point that Anthropic’s CEO misses in his post: DeepSeek innovations are open and reproducible, and this is because the model is open source and accompanied by a technical paper that details the techniques used by DeepSeek’s team to optimize the training, making it more efficient. Amodei understates the power of these information and the impact they might have on the scientific community by saying “to the extent that US labs haven’t alreasy discovered them”, but the effect of DeepSeek’s paper is already visibile: companies such as HuggingFace have started to openly reproduce DeepSeek R1, and others have started building extended synthetic datasets based on DeepSeek’s thinking, such as OpenThoughts-114k by OpenThoughts or Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B by Magpie-Align. Furthermore, simply searching “deepseek” on HuggingFace Models Hub reveals more than 3000 entries, and almost 300 if we type the same query in the Datasets Hub. The impact is evident, and this not only invests “US labs”, but is pervasive, and extends to individual developers and other labs around the world, giving them access to a powerful and reproducible technology, and accelerating the democratization of AI. The same is not true about Claude (the flagship model by Anthropic), but also about most models from OpenAI: they are kept behind the curtains of closed-source, and so not fully reproducible.


  • “This means that in 2026-2027 [when Amodei predicts we will reach AI smarter than humans ndr] we could end up in one of two starkly different worlds. In the US, multiple companies will definitely have the required millions of chips (at the cost of tens of billions of dollars). The question is whether China will also be able to get millions of chips.

    • If they can, we’ll live in a bipolar world, where both the US and China have powerful AI models that will cause extremely rapid advances in science and technology — what I’ve called “countries of geniuses in a datacenter”. A bipolar world would not necessarily be balanced indefinitely. Even if the US and China were at parity in AI systems, it seems likely that China could direct more talent, capital, and focus to military applications of the technology. Combined with its large industrial base and military-strategic advantages, this could help China take a commanding lead on the global stage, not just for AI but for everything.
    • If China can’t get millions of chips, we’ll (at least temporarily) live in a unipolar world, where only the US and its allies have these models. It’s unclear whether the unipolar world will last, but there’s at least the possibility that, because AI systems can eventually help make even smarter AI systems, a temporary lead could be parlayed into a durable advantage. Thus, in this world, the US and its allies might take a commanding and long-lasting lead on the global stage.”

Amodei uses the “two worlds” scenario, a very well known debate technique that is aimed at facing the audience with two strikingly opposed perspectives, one in favor of the author’s thesis and one against it. This technique is particularly efficient because: (a) it narrows down all the possible scenarios to two of them, reducing complex issues into two (often simpler) ways of interpreting reality, so that we polarize all the grays into black and white; (b) it forces the audience to choose a side based often on powerful rhetoric imagery and emotionally-stimulating contrasts. In these two scenarios, Amodei depicts a bipolar world, in which China will eventually catalyze talent, capital and resources to assert its dominance, and a unipolar world, in which the US will hold the AI power. Let’s break this down.

  1. The two scenarios are not the only possible ones: when AI is open source (like in China’s case, with DeeepSeek but also with Qwen and other companies), science can advance and lots of people, from single individuals to laboratories, can reproduce it. Although it’s undeniable that China and US have a great advantage over the rest of the world, with a non-unipolar world that fosters open-source there is still the potential for a decentralized and democratic AI ecosystem.
  2. The assumption that China will invest the AI advancements in the military field completely oversees the fact that the US are already doing it: in November, Meta and Anthropic itself (which is Amodei’s company) gave the US government access to their models for security and defense applications; you can read a good summary of it in a Washington’s Post article by Gerrit De Vynck, but there are also posts by Meta and Palantir (Anthropic’s partner in the deal) about the issue. Amodei himself, in an essay from October 2024, states: “My current guess at the best way to do this is via an “entente strategy”, in which a coalition of democracies seeks to gain a clear advantage (even just a temporary one) on powerful AI by securing its supply chain, scaling quickly, and blocking or delaying adversaries’ access to key resources like chips and semiconductor equipment. This coalition would on one hand use AI to achieve robust military superiority (the stick) while at the same time offering to distribute the benefits of powerful AI (the carrot) to a wider and wider group of countries in exchange for supporting the coalition’s strategy to promote democracy (this would be a bit analogous to “Atoms for Peace”).”
  3. But let’s play Amodei’s game on this: let’s assume that US needs to hold the power. Well, Amodei writes about a “coalition of democracies”, but from his perspective in the blog post concerning DeepSeek the world will be unipolar, lead by “US and its allies”: de facto, it will be US alone, as none of its allies, especially speaking about EU, holds enough power to contribute significantly to this dominance (that’s also why he says unipolar). The point is: are US that trustworthy? Should we really leave the power of developing (mostly closed-source) AI to providers that bow their heads to the current political governance, and change perspective with a change of government? Obviously, I am not here to deny China’s crimes against human rights, mass control policies and lack of democracy, that are much worse than what happens generally in the US, but I need to make a point clear: if we have a multipolar world, the check and balances, the auto-correcting mechanisms of science can intervene and identify/ablate/correct the problems and biases that Chinese models, as well as European and American ones, have, especially if the development of those models has been open sourced. If few companies hold AI in their hands, develop it behind the curtains of closed source and control their models from inside, without getting validation from the wider scientific community, we could have more frequent biases and errors, which would simply go unnoticed because no one is there to check the weights, the training process and techniques used.

  • “Well-enforced export controls are the only thing that can prevent China from getting millions of chips, and are therefore the most important determinant of whether we end up in a unipolar or bipolar world.”

Well enforced export control can make the difference between a unipolar and a bipolar (or, as I see it, multipolar) world, and that’s undeniable. What is less obvious are the implications of this action. Imagine that the next company to release a model that “threatens” US-held AI superiority comes from Europe: following the same logic, if the United States want to maintain their leadership in this technology, they will put an export restriction also on Europe. Maybe they won’t put it in the same way as for China, because most European countries conform to the notion of “democracy” that underpins Amodei’s post, but they will be enough to put any European company in the second/third/fourth place of the game, securing the podium or, at least, the first place to American companies. Setting “well enforeced export controls” on China today creates a dangerous precedent that might see the US doing it again (although maybe on a different scale) against Europe or against whoever they feel could be a threat to their leadership in the future.

Conclusion

In light of all what I’ve written, I have a very important question that I feel compelled to ask, after reading Amodei’s perspective on multipolarity:

How would unipolarity help science to go further? Isn’t modern science, by definition, built on multipolarity, sharing and auto-correction mechanisms based on reviews by the scientific community?

When, between the Middle Ages and the XVII century, Europeans left the control of knowledge to Church institutions, we had unspeakable atrocities, including massacres, persecutions and widespread discrimination based on ethnicity, gender and religious beliefs. Between the 17th and the 18th century, the Scientific Revolution first and the Age of Enlightenment after contributed to the democratization of knowledge (let’s just think about the Encyclopédie, ou dictionnaire raisonné des sciences, des arts et des métiers, by Diderot and colleagues), that paved the way to the foundations of modern democracy, human rights and modern-days science. As you can see, in spite of some side effects that still accompanied the transition to modern science and shared knowledge, opening the knowledge lead to improvements in the world we’ve been living in: from 700 to 1700 AD most of the Europeans were illiterate, poor and abused by their power: in just 300 years, the life of approximately 800 millions people has radically changed, mostly in a positive way. I think that this is an interesting example that we could keep in the back of our heads for the future.

I have then a very brief conclusion, that summarizes my main point of view: we need more open source and more companies like Hugging Face🤗 - It’s the only way we can really advance AI, because progress is not made by wars and restrictions, it’s made by collaborations.

Read More

Search the web with AI

2025-01-09

This article is based on PrAIvateSearch v2.0-beta.0, a privacy-first, AI-powered, user-centered and data-safe application aimed at providing a local and open-source alternative to big AI search engines such as SearchGPT or Perplexity AI.

1. Introduction

PrAIvateSearch logo

In the last blog post we introduced PrAIvateSearch v1.0-beta.0, a data-safe and private AI-powered search engine: despite being a fully functional solution, there were still hassles that made it under-perform, specifically:

  • Content scraping from the URLs resulting from the web search was unreliable and sometimes did not produce correct results.
  • We extracted a series of key-words from web-based contents: keywords were then formatted into JSON and injected into the user prompt as context. This, despite being a viable solution to lighten the prompt, only gave the LLM a partial contextual information.
  • We used RAG for context augmentation, but it often proved inefficient as the retrieved context was passed directly into the user’s prompt, making inference computationally intense and time-requiring.
  • Our interface was basic and essential, lacking the dynamic and modern flows of other web applications.
  • The use of Google Search API to search the web was not optimal under a privacy-wise point of view

In this sense, we decided to apply a way simpler and direct workflow, synthesized in the following image:

PrAIvateSearch workflow

2. Building Blocks

The application is made up by three important building blocks:

  • A dynamic, chat-like user interface built with NextJS  (launched via docker compose)
  • Third-party local database services: Postgres and Qdrant (launched via docker compose)
  • An API service, which is built with FastAPI and served with Uvicorn  (launched in an isolated conda environment): this service is responsible for connecting the frontend with the backend third party services, web search and all the functions related to finding an answer to the user’s query.

We’ll go through all these building blocks, but first let’s have a look on what are the pre-requirements and the first steps to get the code to build PrAIvateSearch v2.0-beta.0.

2a. First steps

To get the necessary code and build the environment to run it, you will need:

Now, to get the code, you can simply run:

git clone https://github.com/AstraBert/PrAIvateSearch.git
cd PrAIvateSearch

First of all move .env.example to .env

mv .env.example .env

…and specify PostgreSQL related variables:

# .env file
pgql_db="postgres"
pgql_user="localhost"
pgql_psw="

You will then need to build the API environment with:

# create the environment
conda env create -f conda_environment.yaml

# activate the environment
conda activate praivatesearch

# run crawl4ai post-installation setup
crawl4ai-setup

# run crawl4ai health checks
crawl4ai-doctor

# deactivate the environment
conda deactivate

And, finally, you will be able to launch the frontend and the databases with:

# The use of the -d option is not mandatory
docker compose up [-d]

3. User Interface

The user interface is now based on a NextJS implementation of a modern, dynamic chat interface. The interface is inspired to ChatGPT, and aims at giving the user a similar experience to that of using OpenAI’s product.

Whenever the user interacts with the UI by sending a message, the NextJS app backend sends a GET request to http://localhost:8000/messages/, where our FastAPI/Uvicorn managed API runs (see below). Once the request is fulfilled, the applications displays the message it got back, which is the answer to the user’s query.

The NextJS application runs on http://localhost:3000 and is launched through docker compose.

4. Database Services

Database services are run locally thanks to docker compose: they are completely user-managed and can be easily monitored. This gives the user complete control over the data flow inside the application.

4a. Qdrant

Qdrant is a vector database service that plays a core role inside PrAIvateSearch. Thanks to Qdrant, we:

  • Create and manage a semantic cache, in which we store questions that the user already asked and answers that the LLM gave to them, so that we can use the same answer if the user inputs the same question or a similar one. We use semantic search to find similar questions and use a cut-off threshold of 75% similarity to filter out non-relevant hits: that’s why our cache is not just a cache, but it is also semantic.
  • Store the content scraped from the web during the web searching and crawling step: we use a sparse collection for this purpose. We then perform sparse retrieval with Splade, loaded with FastEmbed (a python library managed by Qdrant itself): this first retrieval step yields the 5 top hits for the user’s prompt inside the database of contents scraped from the web.

Qdrant is accessible for monitoring, thanks to a complete and easy-to-use interface, on http://localhost:6333/dashboard

4b. Postgres

Postgres is a relational database, and it is employed here essentially as a manager for chat memory.

Thanks to a SQLAlchemy-based custom client, we can load all the messages sent during the conversation in the messages table: this table is structured in such a way that it is compatible with the chat template that we set for our LLM, and so the retrieval of the chat history at inference time already provides the language model with the full number of previously sent messages.

Postgres can be easily monitored through Adminer, a simple and intuitive database management tool: you just need to provide the database type (PostgreSQL), the database name, the username and the password (variables that are passed to Docker and that you can define inside your .env file) . Also Adminer runs locally and is served through docker compose: it is accessible at http://localhost:8080.

5. API Services

As we said, we built a local API service thanks to FastAPI and Uvicorn, two intuitive and easy-to-use tools that make API development seamless.

The API endpoint is up at http://localhost:8000/messages/ and receives the user messages from the NextJS application. The workflow from receiving a message to returning a response is straightforward:

  1. We check if the user’s prompt corresponds to an already asked question thanks to the semantic cache: this cache is basically a dense Qdrant collection, with 768-dimensional vectors produced by LaBSE. If there is a significant match within the semantic cache, we return the same answer used for that match, otherwise we proceed with handling the request
  2. If the user’s query did not yield a significant match from the semantic cache, we proceed to extracting keywords from the user’s natural language prompt with the RAKE (Rapid Automatic Keyword Extraction) algorithm. These keywords will be used to search the web
  3. We search the web through DuckDuckGo Search API: using DuckDuckGo ensures more privacy in surfing the net than exploiting Google. This is part of the effort that PrAIvateSearch makes to ensure that the user’s data are safe and not indexed by Big Techs for secondary (not-always-so-transparent) purposes.
  4. We scrape content using the URLs returned by the web search: to do this, we use Crawl4AI asynchronous web crawler, and we return all the scraped content in markdown format
  5.  We split the markdown texts from the previous step with LangChain MarkdownTextSplitter: after this step, we will have 1000-charachters long text chunks
  6. The chunks are encoded into sparse vectors with Splade and uploaded to a Qdrant sparse collection
  7. We search the sparse collection with the user’s original query and retrieve the top 5 most relevant documents
  8. These 5 relevant documents are then re-ranked: we employ LaBSE to encode the relevant documents into dense vectors and we evaluate their cosine similarity with the vectorized user’s prompt. The most similar document gets retrieved as a context
  9. The context is passed as a user’s message, and the prompt is passed right after it as a user’s message too. In this sense, the LLM only has to reply to the user’s prompt, but can access, through chat memory, the web-based context for it
  10. Qwen-2.5-1.5B-Instruct performs inference on the user’s prompt and generates an answer to it: this answer is then returned as the API final response, and will be displayed in the UI.

    6. Usage

6a. Usage note

[!IMPORTANT] The NextJS application was successfully developed and tested on a Ubuntu 22.04.3 machine, with 32GB RAM, 22 cores CPU and Nvidia GEFORCE RTX4050 GPU (6GB, cuda version 12.3), python version 3.11.11 (packaged by conda 24.11.0)

Although being at a good stage of development, the application is a beta and might still contain bugs and have OS/hardware/python version incompatibilities.

6b. Getting PrAIvateSearch Up and Running

[!NOTE] To get PrAIvateSearch up and running you need to have already executed the first steps

Once we have launched (from within the PrAIVateSearch folder) the databases backend services and the frontend applications via docker compose with this command :

docker compose up

We can head over to the conda environment we set up in the first steps, and launch the API:

# activate the environment
conda activate praivatesearch

# launch the application
uvicorn main:app --host 0.0.0.0 --port 8000

After loading the several AI models involved in the development of the application, you will see that the API is up and ready to receive requests.

If you want to test the API prior to sending requests through the frontend, you can simply use this curl command:

curl "http://0.0.0.0:8000/messages/What%20is%20the%20capital%20of%20France"

If everything went smoothly, you should receive a response like this:

{"response": "The capital of France is **Paris**."}

And that’s all! Now head over to http://localhost:3000 and start playing around with PrAIvateSearch!🪿

7. Conclusion

The aim behind PrAIvateSearch is to provide an open-source, private and data-safe alternative to Big Tech solutions. The application is still a beta, so, although its workflow may seem solid, there may still be hiccups, untackled errors and imprecisions. If you want to contribute to the project, report issues and help developing the OSS AI community and environment, feel free to do so on GitHub and to help it with funding.

Thanks!🤗

Read More

Debate Championship for LLMs

2024-12-30

5 LLMs, 1vs1 matches to produce the most convincing argumentation in favor or against a random motion. Oh, and also the debate judge is an LLM :)

1. Introduction

Large Language Models (LLMs) have revolutionized our everyday life since the launch of ChatGPT in november 2022: OpenAI’s LLM-powered chat application gained one million users in 5 days and, in October 2024, after almost two years from each launch, reached 3.7 billions of visit in a single month, putting it 11th on the shortlists of the most visited websites.

This broad adoption of text-generating Artificial Intelligence (AI) is also reflected in the skyrocketing number of LLM releases by numerous companies: while OpenAI, Anthropic or other big AI companies build mostly closed-source products, these new models, available mainly on HuggingFace Hub, are mostly open-weight or open-source (for an explanation of the difference see this article). Leading the open AI revolution are companies like Meta, Qwen (by Alibaba), Huggingface (HF), Microsoft and many others.

Open models are progressively getting closer in performance to their closed-source counterparts, matching them in many tasks like coding or, with the latest releases, reasoning.

With open LLMs becoming better at complex jobs, one of the fields they can be tested on is debating. There has been some research already on the topic, whose most relevant contributions can be summarized with:

  • Agent4Debate (Zhang et al., 2024): a collaborative framework leveraging a Searcher, an Analyzer, a Writer and a Reviewer to mimic human behavior for debate preparation and execution. Evaluated against human and other baseline models, Agent4Debate demonstrated human-comparable capabilities
  • Debatrix (Liang et al., 2024): a comprehensive LLM judge for multi-turn debate settings
  • Debates used to evaluate the performance of LLMs (Moniri et al., 2024): an automated evaluation framework based on debates among LLMs which are judged by another LLM. This helps in scaling the benchmarking of Language models outside domain-specific knowledge or fixed test sets
  • DebateBrawl (Aryan, 2024): a platform that, integrating genetic algorithms and game theory strategies with LLM reasoning and text generation capabilities, provides the users with an interactive debate experience by crafting coherent and poignant arguments.

In this blog post, we will propose a Debate Championship among five state-of-the-art open models available through HuggingFace Inference API.

2. Materials and Methods

2a. General Structure of the Tournament

The tournament is structured with the so called “Italian” formula, meaning that all participants play with all the others. There is no “home and away games” schema: every participant plays with each of the other ones only once. A model earns one point by winning a game, whereas it does not earn any (but it does not lose any as well) when losing a game.

Each tournament round is one-shot, meaning that each participant has only one possibility to generate a 150-250 words argument, that will be then judged by an external LLM.

This first tournament consists of 5 LLMs as debaters:

And two as judges:

2b. Data Collection and Processing

Code reference: DebateChampionshipLLMs.ipynb

The motions which were used to prompt the debate matches were extracted from kokhayas/english-debate-motions-utds dataset on HuggingFace.

1,000 of them were then randomly sampled from the 10,000+ set of motions contained in the original dataset, and a random motion was selected for each debate round.

from datasets import load_dataset

# download the dataset from HF hub
dts = load_dataset("kokhayas/english-debate-motions-utds")
dtsdct = dts["train"]
     
import random as r

# sample 1000 motions from the original dataset
motions = dtsdct["motion"]
motions2use = []
numbers = []
j = 0
while j < 1000:
    n = r.randint(0,10000)
    if n not in numbers:
        numbers.append(n)
        if motions[n].lower().startswith("th"):
            motions2use.append(motions[n])
            j+=1
        else:
            continue
    else:
        continue

2c. Building and Running the Tournament

Code reference: DebateChampionshipLLMs.ipynb

We approached building the tournament by:

  • decomposing it into its atomic parts, the “building blocks” (defining how debaters and judges generate their answers)
  • scaling to creating the structure of one round (debater 1 -> debater 2 -> judge)
  • defining the entire tournament as a loop of rounds, with debate data collection and points tracking (for the final ranking)

The code to create the building blocks of the debate tournament is the following:

from huggingface_hub import InferenceClient
from google.colab import userdata

# create an HF client for inference
hf_token = userdata.get('HF_TOKEN_INFERENCE')
client = InferenceClient(api_key=hf_token)

# define a function for the debaters to produce their argument
def debate_inference(model, prompt):
  messages = [
	  {"role": "system", "content": "You are skilled in competitive debate. You produce arguments that strictly adhere to the position you are required to take by the prompts you are proposed with"},
	  {"role": "user", "content": prompt}
  ]
  completion = client.chat.completions.create(
    model=model,
  	messages=messages,
  	temperature=0.5,
  	max_tokens=2048,
  	top_p=0.7
  )
  return completion.choices[0].message.content

# define a function for the judges to produce their verdict
def judge_inference(model, motion, essay1, essay2):
  messages = [
	  {"role": "system", "content": "You are a judge, based on the motion, the argumentation in favor of it and the argumentation against it, you should produce a JSON string that contains the following fields:\n\n- winner (str): can take only FAVOR or AGAINST as values, based on who you think the winner is\n- reasons (str): the reasons why you chose the winner. OUTPUT ONLY THE JSON STRING AS: '''\n\n```json\n{\"winner\": 'FAVOR'|'AGAINST', \"reasons\": 'Reasons why you chose the winner'}\n```\n\n'''"},
	  {"role": "user", "content": "MOTION:\n"+motion},
	  {"role": "user", "content": "ARGUMENT IN FAVOR:\n"+essay1},
	  {"role": "user", "content": "ARGUMENT AGAINST:\n"+essay2},
    {"role": "user", "content": "Who is the winner? OUTPUT ONLY THE JSON STRING AS: '''\n\n```json\n{\"winner\": 'FAVOR'|'AGAINST', \"reasons\": 'Reasons why you chose the winner'}\n```\n\n'''"}
  ]
  completion = client.chat.completions.create(
    model=model,
  	messages=messages,
  	temperature=0,
  	max_tokens=2048,
  	top_p=0.7
  )
  return completion.choices[0].message.content

# define a tournament round
def tournament_round(model1, model2, judge, motion):
  prompt1 = "Produce an essay of maximum 150 words in favor of this motion: " + motion
  prompt2 = "Produce an essay of maximum 150 words against this motion: " + motion
  essay1 = debate_inference(model1, prompt1)
  essay2 = debate_inference(model2, prompt2)
  winner_answer = judge_inference(judge, motion, essay1, essay2)
  return essay1, essay2, winner_answer

For the tournament itself to be run, we add the following features to the backbone structure:

  • Point tracking
  • Debate data collection
  • winner and reasons for winner’s choice extraction from the judge’s answer

The last point is especially painful, since the judge’s answer can come in various formats even if the system instructions are very clear on how to structure it, so we decided to tackle the challenge posed by the variability of the output by adding a output parser LLM. This output parser LLM is gpt-4o-mini, that is wrapped into Langchain OpenAI chat class (ChatOpenaAI), and linked to a Pydantic schema for structured output generation:

from google.colab import userdata
import os

# set OpenAI API key as an environment variable
a = userdata.get('OPENAI_API_KEY')
os.environ["OPENAI_API_KEY"] = a

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field

# generate a chat prompt template with Langchain, to wrap your system instructions for the model
GPT_MODEL = "gpt-4o-mini"
llm = ChatOpenAI(temperature=0, model=GPT_MODEL)
system_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are a helpful assistant. Your job is to restructure the virdict from a debate competition so that it follows this structure:
            - winner: the winner, as reported by the virdict
            - reasons: reasons for the choice of the winner
            Strictly follow the virdict you are provided with, do not add/make up any information."""),
        ("human", "{message}"),
    ]
)

from pydantic import BaseModel, Field

# create a Pydantic BaseModel for structured output generation
class Verdict(BaseModel):
    """Structure of the output of a debate competition verdict"""
    winner: str = Field(description="The winner, as reported by the verdict")
    reasons: str = Field(description="Reasons for the choice of the winner")

# define an inference-ready system instructions+LLM+structured output parser 
chain = system_prompt | llm.with_structured_output(Verdict)

Now we can run the tournament:

import time

# define points tracker
modelpoints = {judges[i]: {model: 0 for model in models} for i in range(len(judges))}

# define data collector
motions2args2winner2reasons = {"motions": [], "judge": [], "favor_model": [], "favor_arg": [], "against_model": [], "against_arg": [], "winner": [], "reasons": [], "total_time": []}

judge_counter = 0
for judge in judges:
  judge_counter+=1
  pairs = []
  counter = 0
  for i in range(len(models)):
    for j in range(len(models)):
      # only make two models play with each other if they have not met before
      if i!=j and (i,j) not in pairs and (j,i) not in pairs:
        counter+=1
        pairs.append((i,j))
        motion = r.choice(motions2use)
        favoragainst = {"favor": models[i], "against": models[j]}
        s = time.time()
        favor_arg, against_arg, winner_json = tournament_round(models[i], models[j], judge, motion)
        e = time.time()
        # add debate data to data collector
        motions2args2winner2reasons["total_time"].append(e-s)
        motions2args2winner2reasons["judge"].append(judge)
        motions2args2winner2reasons["motions"].append(motion)
        motions2args2winner2reasons["favor_model"].append(favoragainst["favor"])
        motions2args2winner2reasons["favor_arg"].append(favor_arg)
        motions2args2winner2reasons["against_model"].append(favoragainst["against"])
        motions2args2winner2reasons["against_arg"].append(against_arg)
        virdict = chain.invoke({"message": winner_json})
        reasons = virdict.reasons
        winner = virdict.winner
        winner_model = favoragainst[winner.lower()]
        motions2args2winner2reasons["winner"].append(winner_model)
        motions2args2winner2reasons["reasons"].append(reasons)
        # add a point to the winner model 
        modelpoints[judge][winner_model] += 1
        print(f"Done with match: {judge_counter}.{counter}")
  print("Done with " + judge + " being a judge")

The collected data were manually annotated (Code reference), saved to a CSV file and uploaded as a dataset on HuggingFace hub.

2d. Post-Tournament Analysis

Code references: DebateLLMChampionship_analysis.ipynb and MotionCategoriesAssociations.ipynb

Post-tournament analysis involved:

  1. Analyzing words in motions and winning arguments when QwQ-32B-Preview was a judge
  2. Repeating the same analysis at 1. with Llama-3.3-70B-Instruct as a judge
  3. Repeating the same analysis at 1. with Phi-3.5-mini-instruct winning arguments
  4. Repeating the same analysis at 1. with with HuggingFaceH4/starchat2-15b-v0.1 losing arguments

We also carried out topic association analysis for winning arguments with QwQ-32B-Preview and Llama-3.3-70B-Instruct as judges, as well as the same analysis for Phi-3.5-mini-instruct winning arguments and HuggingFaceH4/starchat2-15b-v0.1 losing arguments.

These are the general functions defined for the analysis:

import pandas as pd
import nltk
from nltk.corpus import stopwords
from collections import Counter
import matplotlib.pyplot as plt
import seaborn as sns
from typing import Dict, List, Tuple
import numpy as np

df_qwq = df[df["judge"] == "Qwen/QwQ-32B-Preview"]

def compare_winning_arg_w_motion(df: pd.DataFrame) -> Dict:
    """
    Analyzes the relationship between winning arguments and their motions.
    Returns a dictionary containing analysis results and statistics.
    """
    # Initialize containers for analysis
    keyword_overlap_scores = []
    winning_word_frequencies = Counter()
    motion_word_frequencies = Counter()
    favor_win_count = 0
    against_win_count = 0
    overlap_by_length = []

    # Analysis results
    results = {
        'overlap_scores': [],
        'word_frequencies': {},
        'winning_sides': {},
        'length_correlations': []
    }

    for index, row in df.iterrows():
        motion = row["motions"]
        motion_keywords = set(extract_keywords(motion))
        motion_word_frequencies.update(motion_keywords)

        # Determine winning argument
        is_favor_winning = row["winner"] == row["favor_model"]
        winning_arg = row["favor_arg"] if is_favor_winning else row["against_arg"]

        # Update win counters
        if is_favor_winning:
            favor_win_count += 1
        else:
            against_win_count += 1

        # Extract and analyze winning argument
        common_words = set(extract_most_common_words(winning_arg, len(motion_keywords)))
        winning_word_frequencies.update(common_words)

        # Calculate overlap score
        overlap = len(motion_keywords.intersection(common_words)) / len(motion_keywords)
        keyword_overlap_scores.append(overlap)

        # Record length correlation
        overlap_by_length.append((len(winning_arg.split()), overlap))

    # Store results
    results['overlap_scores'] = keyword_overlap_scores
    results['word_frequencies'] = {
        'motion': dict(motion_word_frequencies.most_common(20)),
        'winning_args': dict(winning_word_frequencies.most_common(20))
    }
    results['winning_sides'] = {
        'favor': favor_win_count,
        'against': against_win_count
    }
    results['length_correlations'] = overlap_by_length

    # Create visualizations
    create_analysis_plots(results)

    return results

def create_analysis_plots(results: Dict):
    """Creates and displays analysis visualizations."""
    # Set up the plotting area
    plt.style.use('seaborn-v0_8-paper')
    fig = plt.figure(figsize=(15, 10))

    # 1. Overlap Score Distribution
    plt.subplot(2, 2, 1)
    sns.histplot(results['overlap_scores'], bins=20)
    plt.title('Distribution of Keyword Overlap Scores')
    plt.xlabel('Overlap Score')
    plt.ylabel('Count')

    # 2. Winning Sides Pie Chart
    plt.subplot(2, 2, 2)
    sides = results['winning_sides']
    plt.pie([sides['favor'], sides['against']],
            labels=['Favor', 'Against'],
            autopct='%1.1f%%')
    plt.title('Distribution of Winning Sides')

    # 3. Word Frequencies Comparison
    plt.subplot(2, 2, 3)
    motion_words = list(results['word_frequencies']['motion'].keys())[:10]
    motion_freqs = [results['word_frequencies']['motion'][w] for w in motion_words]
    plt.barh(motion_words, motion_freqs)
    plt.title('Top 10 Motion Keywords')
    plt.xlabel('Frequency')

    # 4. Length vs Overlap Scatter Plot
    plt.subplot(2, 2, 4)
    lengths, overlaps = zip(*results['length_correlations'])
    plt.scatter(lengths, overlaps, alpha=0.5)
    plt.title('Argument Length vs Keyword Overlap')
    plt.xlabel('Argument Length (words)')
    plt.ylabel('Overlap Score')

    # Add trend line
    z = np.polyfit(lengths, overlaps, 1)
    p = np.poly1d(z)
    plt.plot(lengths, p(lengths), "r--", alpha=0.8)

    plt.tight_layout()
    plt.show()

# Helper functions (assuming these exist)
def extract_keywords(text: str) -> List[str]:
    """Extract keywords from text. Implement your keyword extraction logic here."""
    stop_words = set(stopwords.words('english'))
    words = nltk.word_tokenize(text.lower())
    return [w for w in words if w.isalnum() and w not in stop_words]

def extract_most_common_words(text: str, n: int) -> List[str]:
    """Extract n most common words from text."""
    words = extract_keywords(text)
    return [word for word, _ in Counter(words).most_common(n)]

3. Results and Conclusions

3a. Tournament Results

The tournament was won by Phi-3.5-mini-instruct, with 5 overall victories and with being the winner also of the tournament batch in which Llama-3.3-70B-Instruct was the judge (Fig 1).

It was followed, in the second place, by Mistral-7B-Instruct-v0.3 (4 victories, winner of the tournament batch in which QwQ-32B-Preview was judge), Llama-3.1-8B-Instruct (4 overall victories) and Qwen2.5-72B-Instruct (4 overall victories).

In the third position we had starchat2-15b-v0.1, with 2 overall victories.

_config.yml

Fig 1: Tournament podium

3b. Favor and Against Winning Cases Distribution

Code reference: DebateLLMChampionship_analysis.ipynb

We first evaluated the “Favor” vs “Against” tendency for the two judges when deciding the winning arguments:

  • QwQ-32B-Preview chose 5 times “Favor” and 5 times “Against”
  • Llama-3.3-70B-Instruct chose 7 times “Favor” and 3 times “Against”

We repeated the same analysis for the cases in which Phi-3.5-mini-instruct was the winner and for those in which starchat2-15b-v0.1 was the loser:

  • Phi-3.5-mini-instruct won 3 time as “Favor” and 2 times as “Against”
  • starchat2-15b-v0.1 lost only when being “Against” the motion (and won twice while being in the “Favor” position and once while being “Against”)

3c. Overlapping between Key Words in Motions and Arguments

Code reference: DebateLLMChampionship_analysis.ipynb

We evaluated the overlapping score between the keywords in the motions and the keywords in the winning arguments in various settings:

  • We evidenced broad variation of overlapping scores both with QwQ-32B-Preview and with Llama-3.3-70B-Instruct as judges. Both the variation ranges were comparable, with the one in the winning arguments from Llama-3.3-70B-Instruct being slightly narrower (Fig 2a-b)
  • The overlapping scores for the winning prompts from Phi-3.5-mini-instruct were comparable with the ones registered for the previous point, but the variation was far broader than the one found for the losing prompts by starchat2-15b-v0.1 (Fig 2c-d)

_config.yml

Fig 2a: Overlapping scores between the keywords in the motions and the keywords in the winning arguments distributions when QwQ-32B-Preview is a judge

_config.yml

Fig 2b: Overlapping scores between the keywords in the motions and the keywords in the winning arguments distributions when Llama-3.3-70B-Instruct is a judge

_config.yml

Fig 2c: Overlapping scores between the keywords in the motions and the keywords in the winning arguments distributions for winning arguments by Phi-3.5-mini-instruct

_config.yml

Fig 2d: Overlapping scores between the keywords in the motions and the keywords in the winning arguments distributions for losing arguments by starchat2-15b-v0.1

TAKEAWAY: Although results do not converge onto a single explanation, we could say that a high overlap score does not necessary help in winning, but that a low overlap score may have an influence on losing the match

We also evaluated the correlation among argument length (in words) and keyword overlapping score: while for overall winning arguments with both QwQ-32B-Preview and Llama-3.3-70B-Instruct as judges there is no significant correlation, Fig 3a-b highlight that there is a stronger positive correlation for Phi-3.5-mini-instruct winning argument and a stronger negative correlation for starchat2-15b-v0.1 losing arguments.

_config.yml

Fig 3a: Correlation between keyword overlapping scores and argument length for winning arguments by Phi-3.5-mini-instruct

_config.yml

Fig 3b: Correlation between keyword overlapping scores and argument length for losing arguments by starchat2-15b-v0.1

TAKEAWAY: This correlation study might point at the fact that starchat2-15b-v0.1 was not able to maintain adherence to the original motion when producing longer arguments, and that might have lead to losing the matches. The ability of maintaining a broader correspondence to the original motion when producing longer arguments might, on the other hand, have influenced Phi-3.5-mini-instruct victories.

3d. Motion Topics and Winning Arguments Correlation

Code reference: MotionCategoriesAssociations.ipynb

We lastly evaluated what positions (“Favor” or “Against”) were deemed winning in correlation to the topic of their motions.

First of all, we accounted for potential “personal opinion” influence (i.e. a bias in the LLM) in the choice of the winner, using gpt-4o-mini to detect these biases and report them along with the expressions that contained “personal opinions” from the judge. We then build Table 1:

Judge Topic Position Influenced Quotes
Qwen/QwQ-32B-Preview Prisoners Extradition Against False  
Qwen/QwQ-32B-Preview Oppose Chinese censorship Favor True The argument in favor is stronger because it emphasizes human rights, freedom of expression, and the need for a balanced approach to social stability. It aligns with international standards and promotes a more inclusive society.
Qwen/QwQ-32B-Preview Democratization of UN Favor False  
Qwen/QwQ-32B-Preview Non-violent movements not leading social change Against False  
Qwen/QwQ-32B-Preview West funding a coup in Myanmar Against False  
Qwen/QwQ-32B-Preview Stop to Bullfighting Favor True The argument in favor of banning bullfighting is stronger due to its emphasis on ethical considerations.
Qwen/QwQ-32B-Preview Paper is better than Internet Against False  
Qwen/QwQ-32B-Preview Ban to self-diagnose websites Favor True The potential for misdiagnosis and delayed treatment poses significant risks to public health. Privacy concerns further underscore the need for regulation or prohibition of these websites to ensure that individuals receive accurate and safe healthcare information and treatment.
Qwen/QwQ-32B-Preview Public workers have right to strike Against False  
Qwen/QwQ-32B-Preview Hedge funds not purchasing sovereign debt Favor False  
meta-llama/Llama-3.3-70B-Instruct Trade Unions slow progress Favor False  
meta-llama/Llama-3.3-70B-Instruct Cancel 3rd World Debt Favor False  
meta-llama/Llama-3.3-70B-Instruct Deny terminally ill patients cures Against True the argument in favor was unable to present a coherent or convincing case.
meta-llama/Llama-3.3-70B-Instruct Prioritized skilled refugees to enter EU Against True a humanitarian-focused approach is more aligned with principles of fairness and equality
meta-llama/Llama-3.3-70B-Instruct Repatriate North Korean refugees Against True the moral and legal imperative to protect refugees’ lives and freedoms takes precedence.
meta-llama/Llama-3.3-70B-Instruct Not replace workers with technology Favor False  
meta-llama/Llama-3.3-70B-Instruct Two parliaments: politicians and experts Favor True The argument in favor presents a more compelling case the benefits of integrating experts into the legislative process seem to outweigh the drawbacks.
meta-llama/Llama-3.3-70B-Instruct Handmade gifts better than brand gifts Favor True The argument in favor presented a more compelling case highlighting the emotional value, personalization, and shared experiences that handmade gifts offer, which outweigh the potential drawbacks mentioned by the argument against.
meta-llama/Llama-3.3-70B-Instruct Do not entrap pedophiles Favor False  
meta-llama/Llama-3.3-70B-Instruct Home-country trials for Guantanamo detainees Favor False  

Table 1: Potential influence of judge’s “personal opinion” in choosing the winner

Table 1 highlights that QwQ-32B-Preview showed “personal opinion” influence in 30% of the cases, whereas Llama-3.3-70B-Instruct in 50% of them: the difference might rely in the intrinsic reasoning structure that QwQ-32B-Preview has, which might help avoiding bias-fed pitfalls in the judgement.

From Table 1 we can also see that both judges choose winning positions (except in few cases) that align with more liberal/left-leaning positions, which might be due to the political “bias” of LLMs, that all seem to align to libera/left-wing/social-democratic views (Rozado, 2024). To better asses the political leaning of our LLMs, we performed the political compass test on Llama-3.3-70B-Instruct (judge), Phi-3.5-mini-instruct and starchat2-15b-v0.1 (the winner and the loser of the tournament) (Fig 4).

_config.yml

Fig 4: Political compass of the three evaluated LLMs

The political compass gives insight on left-leaning, libertarian positions for the three evaluated LLMs: this might mean that the judges positions in the choice of the were influenced by an internal political bias. The intrinsic political leaning of the models may have influenced also the winning chances for Phi-3.5-mini-instruct and starchat2-15b-v0.1 (Table 2):

Model Position Topics
microsoft/Phi-3.5-mini-instruct (winning) Against West funding a coup in Myanmar, Repatriate North Korean refugees
microsoft/Phi-3.5-mini-instruct (winning) Favor Ban to self-diagnose websites, Handmade gifts better than brand gifts, Do not entrap pedophiles
HuggingFaceH4/starchat2-15b-v0.1 (losing) Against Democratization of UN, Stop to Bullfighting, Ban to self-diagnose websites, Not replace workers with technology, Handmade gifts better than brand gifts
HuggingFaceH4/starchat2-15b-v0.1 (losing) Favor None

As you can see, starchat2-15b-v0.1 needed to defend the position against several issues that are generally supported by liberal/left-wing political views: in this sense, the model might have hard a hard time generating a valid argument.

On the other side, all the positions that Phi-3.5-mini-instruct had to defend were aligned with its political views, making it easier for thr LLM to generate convincing and winning arguments.

TAKEAWAY: There might be a correlation between the political leanings of the LLMs and their preferences in winner choice/ability to generate convincing arguments

4. Data and Code Availability

The code is available for reproduction as AstraBert/DebateLLM-Championship GitHub repo. The code is structured as three Google Colab notebooks that execute the code reported in this blog post.

The collected debate data are available as as-cle-bert/DebateLLMs on HuggingFace Hub.

Read More

Building an AI search engine from scratch

2024-12-11

_config.yml

PrAIvateSearch is an AI-powered, user-owned and local search engine

On 26th July 2024, OpenAI introduced a new prototype: SearchGPT, an applocation that would combine the power of their language models with resources from the web in an innovative approach to browsing the immense world of the Internet. SearchGPT was finally rolled out for Pro and Team users on 31st October 2024, as a “Search” extension of ChatGPT. OpenAI is just the tip of the iceberg: web browsing plug-ins and extensions for AI models have been added by numerous providers, and several agentic tools and workflows have been created to keep up with the growing popularity of web searching AIs (here is a non-exhaustive list).

The big problems with all these solutions is that the users do not own them: these services are provided to them by big companies (Google, OpenAI, Microsoft, Meta…), which can retain and postprocess user data, track them and employ them for various purposes, including marketing, training of new models and research. This is not illegal, as long as it is clearly stated in the privacy policies of the companies: examples of this data management policies can be found in OpenAI’s Privacy Policy, Google Gemini Apps Privacy Notice and Meta’s statement on Privacy and GenAI. Nevertheless, the fact that data, prompts and searches could be retained by Big Tech providers underlined the need of an AI-powered, user-owned search application, which we can now find as PrAIvateSearch, a local Gradio application with image- and text-based search capabilities.

The application structure and flow

_config.yml

Fig. 1: Flowchart for PrAIvateSearch

The flow of the application is represented in Fig. 1 and it can be summarized in the following core steps:

  1. The user can provide, through the Gradio UI, two types of input to the app: image and text. If the input is text, it is directly used to search the web, whereas if the input is an image, this is captioned by Florence-2-large and from the caption are extracted search key words (with rake_nltk, a python package based on the Natural Language ToolKit official package), that are then treated as text input.
  2. Once we have our text input, this is used to search the web with the googlesearch-python library: this operation yields a list of URLs.
  3. The text from the pages linked to the URLs is extracted using boilerpy3 and, when boilerpy3 fails, we employ urllib3 to extract the URL text directly.
  4. The extracted text is then reduced to keywords, which are reported into a JSON-like structure that will be used to prompt the language model (which is instructed to interpret the JSON structure).
  5. In the meantime we vectorized the text obtained from the search with LaBSE and we load it into a Qdrant database for future RAG application (if the user enables RAG functionalities). If the RAG functionalities are enabled, prior to data ingestion there is a retrieval step, which will then provide context to our language model based on content from previous searches.
  6. The context, the keywords and the original query from the user get combined into a prompt, which is stored inside the Postgres database as part of the chat history. The chat history is then retrieved in a format which is compatible with the chat template that we set for our language model.
  7. It’s time for inference: Qwen-2.5-3B-Instruct (quantized in 4-bits through bitsandbytes and loaded onto a GPU) is employed to produce an answer that takes into account search results and context, enriching it also with its knowledge. The assistant’s response is then added to the chat history
  8. The response is displayed to the user through the UI.

The application is divided in two portions:

  • A frontend one, rendered through the popular frontend framework Gradio
  • A backend one, which is composed by two third-party database services (Postgres and Qdrant), a third-party Postgres-monitoring platform (Adminer) and the application itself (written in python)

Let’s dive deeper into the backend, while we will come to the frontend at the end.

Third-party services

There are three third-party services (Postgres, Qdrant and Adminer), which one could launch all together with the following compose file:

networks:
  mynet:
    driver: bridge

services:
  db:
    image: postgres
    restart: always
    ports:
      - "5432:5432"
    networks:
      - mynet
    environment:
      POSTGRES_DB: $PG_DB
      POSTGRES_USER: $PG_USER
      POSTGRES_PASSWORD: $PG_PASSWORD
    volumes:
      - pgdata:/var/lib/postgresql/data
 
  semantic_memory:
    image: qdrant/qdrant
    restart: always
    ports:
      - "6333:6333"
      - "6334:6334"
    networks:
      - mynet
    volumes:
      - "./qdrant_storage:/qdrant/storage"
   
  adminer:
    image: adminer
    restart: always
    ports:
      - "8080:8080"
    networks:
      - mynet
 
volumes:
  pgdata:

This would work just by running:

# Add the detach option if you don't want to see the containers logs on your terminal
docker compose up [-d]

Let’s see what we can do with these services…

Service Port Function Python libraries
Postgres 5432 Chat history management (memory of the chatbot) SQLAlchemy
Qdrant 6333, 6334 Semantic memory management (RAG functions for the chatbot) qdrant_client
Adminer 8080 Monitor Postgres DB /

Table 1. Synthesis of the functions of the three services

1. Postgres

Postgres is employed for Chat History storage, and works basically as the memory of the chatbot.

To connect to the service, you should set your Postgrs user, password and database name in a .env file.

Whenever we start our application, we create two tables: conversations (which stores the conversation IDs, the user IDs and the time of start) and messages, which store the messages for the current conversation.

We created a client with SQLAlchemy to interact with Postgres:

# https://github.com/AstraBert/PrAIvateSearch/tree/main/lib/scripts/memory.py

from sqlalchemy import MetaData, create_engine, text
from sqlalchemy.orm import sessionmaker
import warnings

class ErrorOccuredWarning(Warning):
    """An error occured but it was handled by try...except"""

class PGClient:
    def __init__(self, connection_string: str):
        """
        Initialize a Client instance.

        Args:
            connection_string (str): A string representing the database connection information.

        Returns:
            None
        """
        self.engine = create_engine(connection_string)
        self.meta = MetaData(schema="public")
        self.Session = sessionmaker(self.engine)

        with self.Session() as sess:
            with sess.begin():
                sess.execute(text("create schema if not exists public;"))
    def execute_query(self, query):
        try:
            with self.Session() as sess:
                with sess.begin():
                    res = sess.execute(text(query))
            return res
        except Exception as e:
            warnings.warn(f"An error occurred: {e}", ErrorOccuredWarning)
            return None
    def disconnect(self) -> None:
        """
        Disconnect the client from the database.

        Returns:
            None
        """
        self.engine.dispose()
        return

And then we built the actual conversation history class, which allows us to add messages, specifying the role (user, system or assistant) and the content of the message, and to retrieve the message history in a way which is compatible with the chat-template established for our language model:

# https://github.com/AstraBert/PrAIvateSearch/tree/main/lib/scripts/memory.py

class ConversationHistory:
    def __init__(self, client: PGClient, user_id: int):
        self.client = client
        self.user_id = user_id
        self.client.execute_query("""DROP TABLE IF EXISTS conversations;""")
        self.client.execute_query("""DROP TABLE IF EXISTS messages;""")
        self.client.execute_query("""CREATE TABLE conversations (
            id SERIAL PRIMARY KEY,
            user_id INTEGER NOT NULL,
            created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        );""")
        self.client.execute_query("""CREATE TABLE messages (
            id SERIAL PRIMARY KEY,
            conversation_id INTEGER REFERENCES conversations(id),
            role VARCHAR(10) NOT NULL,
            content TEXT NOT NULL,
            timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        );""")
        conv_id = self.client.execute_query(f"INSERT INTO conversations (user_id) VALUES ({self.user_id}) RETURNING id")
        conversation_id = conv_id.fetchone()[0]
        self.conversation_id = conversation_id
    def add_message(self, role, content):
        content = content.replace("'","''")
        self.client.execute_query(f"INSERT INTO messages (conversation_id, role, content) VALUES ({self.conversation_id}, '{role}', '{content}')")
    def get_conversation_history(self):
        res = self.client.execute_query(f"SELECT role, content FROM messages WHERE conversation_id = {self.conversation_id} ORDER BY timestamp ASC")
        messages = res.fetchall()
        return [{"role": role, "content": content} for role, content in messages]

2. Qdrant

Qdrant allows us to enrich the prompts that are presented to our language model with a context coming from previous searches. At every search, the text from the articles that the search produced gets chunked, vectorized by LaBSE (a text embedding model) and uploaded to a Qdrant collection. If the RAG functionalities are enabled by the user, then LaBSE would vectorize query and the search results, performing vector search inside the collection and retrieving a context that will be given to the language model.

Let’s see how we implemented this in our application:

# https://github.com/AstraBert/PrAIvateSearch/tree/main/lib/scripts/websearching.py

from langchain.text_splitter import CharacterTextSplitter
from qdrant_client import QdrantClient, models
from sentence_transformers import SentenceTransformer



encoder = SentenceTransformer("sentence-transformers/LaBSE")
splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
collection_name = f"cute_kitty_{r.randint(1,10000)}"
qdrant_client = QdrantClient("http://localhost:6333")

qdrant_client.recreate_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(
        size=encoder.get_sentence_embedding_dimension(), # Vector size is defined by used model
        distance=models.Distance.COSINE,
    ),
)

def upload_to_qdrant(client: QdrantClient, collection_name: str, encoder: SentenceTransformer, text: str):
    try:
        chunks = splitter.split_text(text)
        docs = []
        for chunk in chunks:
            docs.append({"text": chunk})
        client.upload_points(
            collection_name=collection_name,
            points=[
                models.PointStruct(
                    id=idx,
                    vector=encoder.encode(doc["text"]).tolist(),
                    payload=doc,
                )
                for idx, doc in enumerate(docs)
            ],
        )
        return True
    except Exception as e:
        return False
  • We then proceeded to create class to perform dense retrieval:
# https://github.com/AstraBert/PrAIvateSearch/tree/main/lib/scripts/rag.py

from qdrant_client import QdrantClient
from sentence_transformers import SentenceTransformer

class NeuralSearcher:
        # Convert text query into vector
        vector = self.model.encode(text).tolist()

        # Use `vector` for search for closest vectors in the collection
        search_result = self.qdrant_client.search(
            collection_name=self.collection_name,
            query_vector=vector,
            query_filter=None, # If you don't want any filters for now
            limit=limit,
        )
        payloads = [hit.payload for hit in search_result]
        return payloads

3. Adminer

Adminer is a tool to monitor your PostgreSQL databases. You can access the service by setting the service type as PostgreSQL, and then you can proceed to login with the credentials you set in you .env file (find an example here).

You will be able to check the conversations and the messages table.

Other backend components

1. Image captioning and search word extraction

As we said, PrAIvateSearch supports image-based inputs for search purposes. This is possible because, internally, images are converted to text inputs thanks to a SOTA image captioning model, Florence-2-large by Microsoft. The image caption, nevertheless, generally contains information that are misleading for the search, for example: “This image shows” Or “In this image you can see”. In this case we perform key-word extraction with RAKE (Rapid Algorithm for Keyword Extraction) implementation by NLTK, and we proceed to exclude all the words and expressions that contain “image*”.

We do this with the following script:

# https://github.com/AstraBert/PrAIvateSearch/tree/main/lib/script/image_gen.py

import warnings
warnings.filterwarnings("ignore")

import einops
import timm

import torch
from transformers import AutoProcessor, AutoModelForCausalLM 
from rake_nltk import Metric, Rake

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model = AutoModelForCausalLM.from_pretrained("microsoft/Florence-2-large", torch_dtype=torch_dtype, trust_remote_code=True).to(device)
processor = AutoProcessor.from_pretrained("microsoft/Florence-2-large", trust_remote_code=True)

task_prompt = "<DETAILED_CAPTION>"
raker = Rake(include_repeated_phrases=False, ranking_metric=Metric.WORD_DEGREE)

def extract_keywords_from_caption(caption: str) -> str:
    raker.extract_keywords_from_text(caption)
    keywords = raker.get_ranked_phrases()[:5]
    fnl = []
    for keyword in keywords:
      if "image" in keyword:
        continue
      else:
        fnl.append(keyword)
    return " ".join(fnl)

def caption_image(image):
    global task_prompt
    prompt = task_prompt
    inputs = processor(text=prompt, images=image, return_tensors="pt").to(device, torch_dtype)
    generated_ids = model.generate(
      input_ids=inputs["input_ids"],
      pixel_values=inputs["pixel_values"],
      max_new_tokens=1024,
      num_beams=3
    )
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]

    parsed_answer = processor.post_process_generation(generated_text, task=task_prompt, image_size=(image.width, image.height))

    caption = parsed_answer["<DETAILED_CAPTION>"]
    search_words = extract_keywords_from_caption(caption)
    return search_words

As you can see, also Florence is loaded on GPU for faster inference.

The resulting key words are treated as text input and sent to Google Search as query.

2. Web Search, RAG and prompt building

We perform a search through Google Search python package (the user can set the maximum number of retrieved results from 1 to 10): this yields a list of URLs, whose content we then proceed to read with boilerpy3 (or, in case of failure, we extract text directly from the URL with urllib3). Each text thus obtained is then mapped into a dictionary to its 20 (max) most important key words (extracted with RAKE), and the dictionary is then dumped into a JSON-like string, reported under the “KEYWORDS” section in the final prompt. If no keywords are yielded from the search, this is explicitly set in the JSON structure.

If RAG is enabled, the three most important contexts are retrieved and packed together to form the prompt under the “CONTEXT” section of it. At the beginning to the prompt, in the section “QUERY”, we report the original text query by the user/extracted query from the image input. Before returning the prompt, nevertheless, we chunk the content we retrieved from the search, vectorize it and send it to our Qdrant collection.

Our websearching.py now will be complete and will look like this:

# https://github.com/AstraBert/PrAIvateSearch/tree/main/lib/scripts/websearching.py

import warnings
warnings.filterwarnings("ignore")

from googlesearch import search
from rake_nltk import Rake
from boilerpy3 import extractors
import json
from langchain.text_splitter import CharacterTextSplitter
from qdrant_client import QdrantClient, models
from sentence_transformers import SentenceTransformer
from rag import NeuralSearcher
import random as r
from datetime import datetime
from urllib.parse import urlparse



encoder = SentenceTransformer("sentence-transformers/LaBSE")
splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
extractor = extractors.ArticleExtractor()
collection_name = f"cute_kitty_{r.randint(1,10000)}"
qdrant_client = QdrantClient("http://localhost:6333")
searcher = NeuralSearcher(collection_name, qdrant_client, encoder)
r = Rake()

qdrant_client.recreate_collection(
    collection_name=collection_name,
    vectors_config=models.VectorParams(
        size=encoder.get_sentence_embedding_dimension(), # Vector size is defined by used model
        distance=models.Distance.COSINE,
    ),
)

def extract_corpus(url):
    # Parse the URL to get its components
    parsed_url = urlparse(url)
    # Extract the domain name without subdomains or TLD
    domain = parsed_url.netloc.split('.')
    # Return the main word (corpus)
    if len(domain) > 2: # Handle subdomains
        return domain[-2]
    return domain[0]

def upload_to_qdrant(client: QdrantClient, collection_name: str, encoder: SentenceTransformer, text: str):
    try:
        chunks = splitter.split_text(text)
        docs = []
        for chunk in chunks:
            docs.append({"text": chunk})
        client.upload_points(
            collection_name=collection_name,
            points=[
                models.PointStruct(
                    id=idx,
                    vector=encoder.encode(doc["text"]).tolist(),
                    payload=doc,
                )
                for idx, doc in enumerate(docs)
            ],
        )
        return True
    except Exception as e:
        return False


def date_for_debug():
    date = datetime.now()
    s = f"{date.year}-{date.month}-{date.day} {date.hour}:{date.minute}:{date.second}"
    return s

# Function to perform web search
def web_search(query, num_results=5, enable_rag=False, debug = True):
    global qdrant_client, encoder, collection_name
    search_results = []
    for url in search(query, num_results=num_results):
        search_results.append(url)
    urls = list(set(search_results))
    jsonlike = {}
    finalcont = ""
    if len(urls) > 0:
        for url in urls:
            try:
                content = extractor.get_content_from_url(url)
                r.extract_keywords_from_text(content)
                keywords = r.get_ranked_phrases()[:20]
                jsonlike.update({url: {"keywords": keywords}})
                finalcont+=content+"\n\n"
            except Exception as e:
                if debug:
                    print(f"[{date_for_debug()}] WARNING! {e}")
                content = extract_corpus(url) + " " + " ".join(url.split("/")[3:])
                r.extract_keywords_from_text(content)
                keywords = r.get_ranked_phrases()[:20]
                jsonlike.update({url: {"keywords": keywords}})
                finalcont += content
                continue
    else:
        jsonlike = {"keywords": "THE SEARCH DID NOT PRODUCE MEANINGFUL RESULTS (base the answer on the context, if given)"}
    context = ""
    if enable_rag:
        res = searcher.search(finalcont, 3)
        for i in range(len(res)):
            context += res[i]["text"]+"\n\n"+"---------------"+"\n\n"
    truth = upload_to_qdrant(qdrant_client, collection_name, encoder, finalcont)
    jsonstr = json.dumps(jsonlike)
    if truth:
        if context:
            return "QUERY:\n\n"+query+"\n\nKEYWORDS:\n\n"+jsonstr+"\n\nCONTEXT:\n\n"+context, f"[{date_for_debug()}] SUCCESS! Semantic memory successfully updated!"
        else:
            return "QUERY:\n\n"+query+"\n\nKEYWORDS:\n\n"+jsonstr, f"[{date_for_debug()}] SUCCESS! Semantic memory successfully updated!"
    if context:
        return "QUERY:\n\n"+query+"\n\nKEYWORDS:\n\n"+jsonstr+"\n\nCONTEXT:\n\n"+context, f"[{date_for_debug()}] WARNING! Something went wrong while updating semantic memory"
    return "QUERY:\n\n"+query+"\n\nKEYWORDS:\n\n"+jsonstr, f"[{date_for_debug()}] WARNING! Something went wrong while updating semantic memory"

Be careful with RAG functionalities! YES, Qwen-2.5-3B-Instruct is a relatively small model that, quantized, takes up approx. 2GB of the GPU vRAM, BUT if you provide it with a context that is too long it can take hours to process your prompt and generate a response (especially if your hardware is not the most powerful)

3. Verbose debugging information

You may have noticed that we included several debug variables in our functions. If the debugging option is true (and by default it is), you can view several processes, including start/end of query processing, semantic memory updates and chat history logs, directly on your terminal. This is particularly useful when it comes to understanding what could have gone wrong if you have some problems and evaluating the app performance.

4. Text inference

Text inference is the very last part of the backend, and involves Qwen generating a response to the user’s prompt.

As we said, we first created a chat template, using trl and transformers, the same awesome library by HuggingFace that manages all the AI models loading. This chat template is then basically copied by the structure of how the chat history is stored in the Postgres DB, and in the way it is retrieved by the get_chat_history function.

The entire list of messages is used to prompt Qwen, which then generates an answer based on that. The assistant’s answer is then uploaded to the Postgres database. This is the code implementation:

# https://github.com/AstraBert/PrAIvateSearch/blob/main/lib/scripts/text_inference.py

import warnings
warnings.filterwarnings("ignore")

import accelerate

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig  
from dotenv import load_dotenv
from memory import ConversationHistory, PGClient
import os
import random as r
from trl import setup_chat_format
from websearching import date_for_debug

load_dotenv()

model_name = "Qwen/Qwen2.5-3B-Instruct"
quantization_config = BitsAndBytesConfig(load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type= "nf4"
)

quantized_model = AutoModelForCausalLM.from_pretrained(model_name, device_map="cuda:0", torch_dtype=torch.bfloat16,quantization_config=quantization_config)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.chat_template = None
quantized_model, tokenizer = setup_chat_format(model=quantized_model, tokenizer=tokenizer)



pg_db = os.getenv("PG_DB")
pg_user = os.getenv("PG_USER")
pg_psw = os.getenv("PG_PASSWORD")

pg_conn_str = f"postgresql://{pg_user}:{pg_psw}@localhost:5432/{pg_db}"
pg_client = PGClient(pg_conn_str)

usr_id = r.randint(1,10000)
convo_hist = ConversationHistory(pg_client, usr_id)
convo_hist.add_message(role="system", content="You are a web searching assistant: your task is to create a human-readable content based on a JSON representation of the keywords of several websites related to the search that the user performed and on the context that you are provided with")

def pipe(prompt: str, temperature: float, top_p: float, max_new_tokens: int, repetition_penalty: float):
    tokenized_chat = tokenizer.apply_chat_template(prompt, tokenize=True, add_generation_prompt=True, return_tensors="pt")
    outputs = quantized_model.generate(tokenized_chat, max_new_tokens=max_new_tokens, temperature=temperature, top_p=top_p, repetition_penalty=repetition_penalty) 
    results = tokenizer.decode(outputs[0])
    return results

def text_inference(message, debug):
    convo_hist.add_message(role="user", content=message)
    prompt = convo_hist.get_conversation_history()
    if debug:
        print(f"[{date_for_debug()}] CONVERSATIONAL HISTORY")
        print(prompt)
    res = pipe(
        prompt,
        temperature=0.1,
        top_p=1,
        max_new_tokens=512,
        repetition_penalty=1.2
    )
    ret = res.split("<|im_start|>assistant\n")[1]
    convo_hist.add_message(role="assistant", content=ret)
    return ret

Frontend and UI

As we said, the frontend is managed through Gradio, a popular UI-building framework for python developers. The interface is built with a text box for text-based input, an image uploading widget and a slider to select the number of Google Search results. We also have two checkbox options to enable/disable RAG and debugging functionalities.

The output is instead wrapped inside a Markdown-rendering text area.

Here is the code for our app.py file:

# https://github.com/AstraBert/PrAIvateSearch/blob/main/lib/scripts/app.py

import warnings
warnings.filterwarnings("ignore")

import gradio as gr
from text_inference import text_inference
from image_gen import caption_image
from PIL import Image
from websearching import web_search, date_for_debug

def reply(text_input, image_input=None, max_results=5, enable_rag=False, debug = True):
    if debug:
        print(f"[{date_for_debug()}] Started query processing...")
    if image_input is None:
        prompt, qdrant_success = web_search(text_input, max_results, enable_rag, debug)
        if debug:
            print(qdrant_success)
        results = text_inference(prompt, debug)
        results = results.replace("<|im_end|>","")
        if debug:
            print(f"[{date_for_debug()}] Finished query processing!")
        return results
    else:
        if text_input:
            img = Image.fromarray(image_input)
            caption = caption_image(img)
            full_query = caption +"\n\n"+text_input
            prompt, qdrant_success = web_search(full_query, max_results, enable_rag)
            if debug:
                print(qdrant_success)
            results = text_inference(prompt, debug)
            results = results.replace("<|im_end|>","")
            if debug:
                print(f"[{date_for_debug()}] Finished query processing!")
            return results
        else:
            img = Image.fromarray(image_input)
            caption = caption_image(img)
            prompt, qdrant_success = web_search(caption, max_results, enable_rag)
            if debug:
                print(qdrant_success)
            results = text_inference(prompt, debug)
            results = results.replace("<|im_end|>","")
            if debug:
                print(f"[{date_for_debug()}] Finished query processing!")
            return results
        

iface = gr.Interface(fn=reply, inputs=[gr.Textbox(value="",label="Search Query"), gr.Image(value=None, label="Image Search Query"), gr.Slider(1,10,value=5,label="Maximum Number of Search Results", step=1), gr.Checkbox(value=False, label="Enable RAG"), gr.Checkbox(value=True, label="Debug")], outputs=[gr.Markdown(value="Your output will be generated here", label="Search Results")], title="PrAIvateSearch")

iface.launch(server_name="0.0.0.0", server_port=7860)

Getting the app up and running

To get the app up and running, you first of all should install all the necessary dependencies:

# Get the requirements file
wget https://raw.githubusercontent.com/AstraBert/PrAIvateSearch/main/requirements.txt
# Create a virtual environment
python3 -m venv virtualenv
# Activate the virtual environment
source virtualenv/bin/activate
# Install dependencies
python3 -m pip install -r requirements.txt

Secondly, you should initialize the third-party services:

# Get the requirements file
wget https://raw.githubusercontent.com/AstraBert/PrAIvateSearch/main/compose.yaml
# Run the third-party servicess
docker compose up

Last but not least, run the application and head over to http://localhost:7860 when the loading is complete:

# Clone the repository
wget https://github.com/AstraBert/PrAIvateSearch.git
# Go inside the directory
cd PrAIvateSearch
# Run the app
python3 lib/scripts/app.py

You will now be able to play around with it as much as you want!

Conclusion

The aim behind PrAIvateSearch is to provide an open-source, private and data-safe alternative to Big Tech solutions. The application is still a beta, so, although its workflow may seem solid, there may still be hiccups, untackled errors and imprecisions. If you want to contribute to the project, report issues and help developing the OSS AI community and environment, feel free to do so on GitHub and to help it with funding.

Thanks!🤗

Read More

AI is turning nuclear: a review

2024-10-20

Will nuclear power satiate AI energy hunger?

_config.yml

This image was generated using FLUX1-dev

AI, data and energy: an introduction

November 2022 changed the life of humans forever: the world of Artificial Intelligence, that had been operating for years out of the spotlight, finally came to the limelights with OpenAI’s ChatGPT, a chat interface that leveraged a Large Language Model (GPT-3) to generate responses to the humans it interacted with. The excitement around AI exited then for the first time the scientific community, reaching also the business world: in almost two years, investments and revenues in the field rocketed, with big and small companies pushing the revolution further, testing the limits of our technologies.

In less than two years, from GPT-3 to Llama-3, the data volumes for AI went up from 10^11 to 10^13 training tokens, and this data hunger, combined with the need for computational power, will drive the increase in data centers’ energy demand to almost double its current size in 2030.

Environmental costs of Artificial Intelligence are pretty much obscure, due to non-disclosure policies of the companies that build the most of it, but the path is clear: its power needs will be huge, and the consequences on the electrical consumption will be very relevant.

The question now is: how will we be able to power this revolution without worsening the already dramatic climate crisis we’re going through?

Understanding the problem: some key facts

1. AI companies are investing in more powerful hardwares

Following Beth Kindig’s steps on Forbes, we can see that hardware-producing companies, such as NVIDIA, AMD and Intel, are putting money into more and more powerful chips, able to manage larger data volumes in a fast and efficient way, but with increased power requirements:

  • Up to now, the two most powerful NVIDIA GPU hardwares, A100 and H100, consume respectively 250W/chip and 300 to 700W/chip when brought to the maximum power. The next generation GPUs, Blackwell’s series B200 and GB200, will be able to run at 1200 and 2700W/chip, with a 4-fold increase in their power consumption
  • AMD’s most powerful GPU hardware, MI300x, consumes 750W/chip, up to 50% compared to its predecessor MI250
  • Intel is currently working on the Falcon shores chips, which will have a 1500W/chip power consumption, a 67% increase if compared to Gaudi 3, which “only” consumes 900W.

2. AI developers are pushing to build bigger powerhouses for their models

Training and running models takes a huge toll of computation and data flow, which, with the scaling up of AI revolution, will become bigger every year, requiring larger and larger physical infrastructures where to fuel this computational power:

  • In summer 2024, xAI announced through Elon Musk that they built a 100.000 H200 GPUs powerhouse where to run and train the latest versions of their model Grok
  • Meta, in their Building Meta’s GenAI infrastructure statement, announced that it will focus its investments on two 24.000 GPU clusters, and said that: “By the end of 2024, we’re aiming to continue to grow our infrastructure build-out that will include 350,000 NVIDIA H100 GPUs as part of a portfolio that will feature compute power equivalent to nearly 600,000 H100s.”.
  • Google announced that it is investing $3 billion dollars in South Eastern Asia, especially Malaysia and Thailand, to expand its AI capabilities and cloud infrastructure

3. AI is not as green as we think

AI already huge power consumption is estimated to grow 10 times by 2026, surpassing the power requirements of a small country like Belgium. This demand does not come without a cost: despite claims of “greenness” by companies, the impact on the environment is way more complex than it appears, and it goes beyond the emissions:

  • In 2022, Google claimed that its data center in Finland run on 98% carbon-free energy. This percentage, nevertheless, goes down to 4-18% in Asian data centers, exactly where Google is now pouring money to build new infrastructure.
  • In 2019, Microsoft announced their partnering with ExxonMobil, one of the biggest oil companies in the world: thanks to several AI tools, ExxonMobil announced they optimized oil extraction and will be able to increase it by 50.000 barrels/day in 2025
  • According to a 2023 research study, AI is not only hungry for energy, it is also thirsty for water: water is one of the most used coolers for data centers, which makes it crucial to maintain them at an optimal performance status. This is even more important in hot areas like Arizona data centers, where temperatures reach high peaks during summer and water becomes scarce. The estimated water volumes needed by AI per se in 2027 are 4.2 to 6.6 billion cubic meters, like the water consumption of the entire UK, and training GPT-3 alone in Microsoft SOTA data centers required 700.000 liters of fresh water.
  • In its 2024 environmental report, Google claimed that AI-driven energy requirements in data centers brought their greenhouse gases emissions up by 48%

Summing everything up, AI is growing fast, hardware producers are making it more and more power demanding, big tech companies are pouring billions into huge computational and data factories to cope with the growth of the sector, and the resulting impact on the environment, both direct and indirect, is becoming more and more relevant.

Going nuclear: the solution?

1. The context

Although not as concerned as environmental scientists are, big tech companies are still driven by money and practicality: if the energy requirements of AI become too big and they are not able to provide enough electricity to satisfy them, the game will be over for everyone.

In this sense, Microsoft, Amazon and Google announced that they will all be involved in some nuclear-related project, renting, acquiring or building from scratch new nuclear-fuelled power plants to help with the energy demand:

  • Microsoft will restart Three Miles nuclear power plant in Pennsylvania, home to the biggest nuclear leak in the USA history, to generate 835 megawatts (MW) of energy to put in their grid.
  • Amazon will rely on the public consortium Energy Northwest to build four Small Modular Reactors to reach a total power of 960 MW at full capacity, an equivalent of the power consumed by 770.000 American households.
  • Google partnered with Kairos Power to deploy several Small Modular Reactors to bring online by 2030 and some others by 2035, for a total of 500 MW of power

To understand the importance of these decisions, we have to understand why nuclear is being chosen over other technologies and what are the Small Modular Reactors on which the big techs are betting.

2. Nuclear energy

The debate on nuclear energy has been going on for decades, and concerned its safety, its impact on the environment and the consequences on human and animal health. To understand its importance beyond political and ideological factions, let’s get some facts straight:

  • Nuclear energy is produced via nuclear fission, a process that involves bombarding the nucleus of unstable radioactive elements (like uranium) with neutrons: this activates a cascade of events which, in a controlled environment, frees usable energy that comes from the stabilization of the atomic nuclei. This happens because, generally, a radioactive nucleus loses energy going from an unstable to a stable form, energy which can be piped into stable channels and served to an electrical grid.
  • Nuclear energy does not require anything to be burnt, does not involve greenhouse gases emissions and yields high amounts of energy with relatively low quantity of radioactive material: natural uranium in a fast-breeder reactor has an energy density of approx. 86 million joules per kilogram, 3.6 million times higher than coal
  • There are now 440 reactors distributed in 31 countries all around the world that, in 2023, satisfied 10% of the global electricity demand
  • Safety concerns about potential nuclear incidents due to bad constructions are well behind us, being the current safety protocols very meticulous and solid. Nevertheless, we still have the problem of ‘nuclear waste’, which is composed by all exhausted radioactive or radiation-exposed materials. Although not being a main concern now, nuclear waste has to be disposed: as of today, the simplest solution would be to put it underground, in caves where it would stay far apart from humanity for hundreds of thousands of years.
  • The main problem to implement nuclear energy on a large scale are the surging costs (that in the USA range approx. from 3000 to 6000 $/kWh) that are required to build reactors and the not-so-quick construction times (average is 11-12 years, with relevant exceptions)

So nuclear energy, although not being renewable (it depends on radioactive materials, which are a limited resource), is green and strongly effective, but suffers from high production costs and long construction times, apart from the problem of nuclear waste.

3. Small Modular Reactors

One potential solution to the problems that affect nuclear energy development are Small Modular Reactors (SMR) which are, as the name suggests, smaller implementations of the traditional power plants.

  • They are small and modular, so their modules can be pre-assembled in a factory and just combined into a reactors in loco, speeding up significantly the construction times and dramatically cutting the costs.
  • Their security is managed without complex systems: being small and not dealing with high quantities of energy, these reactors take advantage of naturally-occurring physical processes to safeguard the energy production
  • They have a good energy efficiency: even though they produce a third of the energy that generally a traditional reactor outputs, they can be coupled with renewable sources of energy to enhance their performances.

Despite the obvious advantages, lots SMRs are still in the designing phase, and there is not enough evidence to assess their nuclear waste production: a research by Standford and British Columbia University suggests indeed that they would produce (in proportion) more waste than traditional reactors, compared to an energy production which still does not surpass the 300 MW/reactor.

So this leads to our big question, but also conclusion:

4. Why are Big Tech turning nuclear for AI?

As we saw, nuclear energy is highly efficient and, with technological advancements such as SMRs, is becoming more and more feasible and scalable. Apart from the nuclear waste problem (which can still constitute a big issue on the long run), nuclear energy is clean and carbon-free, so it does not contribute to the climate crisis. All of these reasons make it the perfect candidate to “clean” AI while yielding more power for it, even though some key points still remain unclear:

  • Big techs are pushing to build nuclear power but their energy requirements are way larger than what could be provided by those SMRs only: Google alone, according to its own environmental report, consumed 24 TWh of electricity in 2024, which means 24 millions MWh. The SMRs could contribute for a very small part, which probably will be piped straight into GenAI data centers and facilities, but they alone won’t actually be able to satisfy the ever growing energy hunger of AI.
  • These projects, even though planned on the short term (most of them will be carried out before 2035-2040), will take time, but the AI boom is happening now and the surging demand will be a problem way before 2035-2040: what will the strategy of the big techs be for the time being?
  • Besides investments in nuclear energy, big techs will need to give their money also to clean energy facilities. What they’ve been doing up to now, tho, has been acquiring Renewable Energy Credits (RECs) as a workaround: arguing that getting an entirely clean and green stream of renewable energy is almost impossible, tech giants just give money to developers that ensure that they’ll use those investments to build new renewable energy infrastructures. Another widely used model are carbon credits (CCs), a financial instrument that allows a company to pay someone else to take action and reduce their carbon emissions. RECs and CCs combined are a cheap and easy way to claim environmental goals without actually having met them in practice: according to a review by MIT, this strategy is widely used (Google, Amazon, Meta and Salesforce are just some examples) and often brings to no/scarce actual results in lowering a company’s impact, despite the claims of carbon neutrality.
  • Electrical grids are becoming every day more stressed because of the needs for energy by data centers and computational facilities: how will they handle the incoming power that is being poured into them to feed the demand of AI?

So, in conclusion: are big techs really interested in the decarbonizing potential of nuclear energy, apart from its power efficiency, or are they just energy-hungry and trying to find some short-term cost effective solutions which will also allow them to green-wash their image? There is no easy answer, and maybe there is no answer at all, for now: only the future will tell us what side they took.

References

See the references for this article here

Read More