AI is turning nuclear: a review

2024-10-20

Will nuclear power satiate AI energy hunger?

_config.yml

This image was generated using FLUX1-dev

AI, data and energy: an introduction

November 2022 changed the life of humans forever: the world of Artificial Intelligence, that had been operating for years out of the spotlight, finally came to the limelights with OpenAI’s ChatGPT, a chat interface that leveraged a Large Language Model (GPT-3) to generate responses to the humans it interacted with. The excitement around AI exited then for the first time the scientific community, reaching also the business world: in almost two years, investments and revenues in the field rocketed, with big and small companies pushing the revolution further, testing the limits of our technologies.

In less than two years, from GPT-3 to Llama-3, the data volumes for AI went up from 10^11 to 10^13 training tokens, and this data hunger, combined with the need for computational power, will drive the increase in data centers’ energy demand to almost double its current size in 2030.

Environmental costs of Artificial Intelligence are pretty much obscure, due to non-disclosure policies of the companies that build the most of it, but the path is clear: its power needs will be huge, and the consequences on the electrical consumption will be very relevant.

The question now is: how will we be able to power this revolution without worsening the already dramatic climate crisis we’re going through?

Understanding the problem: some key facts

1. AI companies are investing in more powerful hardwares

Following Beth Kindig’s steps on Forbes, we can see that hardware-producing companies, such as NVIDIA, AMD and Intel, are putting money into more and more powerful chips, able to manage larger data volumes in a fast and efficient way, but with increased power requirements:

  • Up to now, the two most powerful NVIDIA GPU hardwares, A100 and H100, consume respectively 250W/chip and 300 to 700W/chip when brought to the maximum power. The next generation GPUs, Blackwell’s series B200 and GB200, will be able to run at 1200 and 2700W/chip, with a 4-fold increase in their power consumption
  • AMD’s most powerful GPU hardware, MI300x, consumes 750W/chip, up to 50% compared to its predecessor MI250
  • Intel is currently working on the Falcon shores chips, which will have a 1500W/chip power consumption, a 67% increase if compared to Gaudi 3, which “only” consumes 900W.

2. AI developers are pushing to build bigger powerhouses for their models

Training and running models takes a huge toll of computation and data flow, which, with the scaling up of AI revolution, will become bigger every year, requiring larger and larger physical infrastructures where to fuel this computational power:

  • In summer 2024, xAI announced through Elon Musk that they built a 100.000 H200 GPUs powerhouse where to run and train the latest versions of their model Grok
  • Meta, in their Building Meta’s GenAI infrastructure statement, announced that it will focus its investments on two 24.000 GPU clusters, and said that: “By the end of 2024, we’re aiming to continue to grow our infrastructure build-out that will include 350,000 NVIDIA H100 GPUs as part of a portfolio that will feature compute power equivalent to nearly 600,000 H100s.”.
  • Google announced that it is investing $3 billion dollars in South Eastern Asia, especially Malaysia and Thailand, to expand its AI capabilities and cloud infrastructure

3. AI is not as green as we think

AI already huge power consumption is estimated to grow 10 times by 2026, surpassing the power requirements of a small country like Belgium. This demand does not come without a cost: despite claims of “greenness” by companies, the impact on the environment is way more complex than it appears, and it goes beyond the emissions:

  • In 2022, Google claimed that its data center in Finland run on 98% carbon-free energy. This percentage, nevertheless, goes down to 4-18% in Asian data centers, exactly where Google is now pouring money to build new infrastructure.
  • In 2019, Microsoft announced their partnering with ExxonMobil, one of the biggest oil companies in the world: thanks to several AI tools, ExxonMobil announced they optimized oil extraction and will be able to increase it by 50.000 barrels/day in 2025
  • According to a 2023 research study, AI is not only hungry for energy, it is also thirsty for water: water is one of the most used coolers for data centers, which makes it crucial to maintain them at an optimal performance status. This is even more important in hot areas like Arizona data centers, where temperatures reach high peaks during summer and water becomes scarce. The estimated water volumes needed by AI per se in 2027 are 4.2 to 6.6 billion cubic meters, like the water consumption of the entire UK, and training GPT-3 alone in Microsoft SOTA data centers required 700.000 liters of fresh water.
  • In its 2024 environmental report, Google claimed that AI-driven energy requirements in data centers brought their greenhouse gases emissions up by 48%

Summing everything up, AI is growing fast, hardware producers are making it more and more power demanding, big tech companies are pouring billions into huge computational and data factories to cope with the growth of the sector, and the resulting impact on the environment, both direct and indirect, is becoming more and more relevant.

Going nuclear: the solution?

1. The context

Although not as concerned as environmental scientists are, big tech companies are still driven by money and practicality: if the energy requirements of AI become too big and they are not able to provide enough electricity to satisfy them, the game will be over for everyone.

In this sense, Microsoft, Amazon and Google announced that they will all be involved in some nuclear-related project, renting, acquiring or building from scratch new nuclear-fuelled power plants to help with the energy demand:

  • Microsoft will restart Three Miles nuclear power plant in Pennsylvania, home to the biggest nuclear leak in the USA history, to generate 835 megawatts (MW) of energy to put in their grid.
  • Amazon will rely on the public consortium Energy Northwest to build four Small Modular Reactors to reach a total power of 960 MW at full capacity, an equivalent of the power consumed by 770.000 American households.
  • Google partnered with Kairos Power to deploy several Small Modular Reactors to bring online by 2030 and some others by 2035, for a total of 500 MW of power

To understand the importance of these decisions, we have to understand why nuclear is being chosen over other technologies and what are the Small Modular Reactors on which the big techs are betting.

2. Nuclear energy

The debate on nuclear energy has been going on for decades, and concerned its safety, its impact on the environment and the consequences on human and animal health. To understand its importance beyond political and ideological factions, let’s get some facts straight:

  • Nuclear energy is produced via nuclear fission, a process that involves bombarding the nucleus of unstable radioactive elements (like uranium) with neutrons: this activates a cascade of events which, in a controlled environment, frees usable energy that comes from the stabilization of the atomic nuclei. This happens because, generally, a radioactive nucleus loses energy going from an unstable to a stable form, energy which can be piped into stable channels and served to an electrical grid.
  • Nuclear energy does not require anything to be burnt, does not involve greenhouse gases emissions and yields high amounts of energy with relatively low quantity of radioactive material: natural uranium in a fast-breeder reactor has an energy density of approx. 86 million joules per kilogram, 3.6 million times higher than coal
  • There are now 440 reactors distributed in 31 countries all around the world that, in 2023, satisfied 10% of the global electricity demand
  • Safety concerns about potential nuclear incidents due to bad constructions are well behind us, being the current safety protocols very meticulous and solid. Nevertheless, we still have the problem of ‘nuclear waste’, which is composed by all exhausted radioactive or radiation-exposed materials. Although not being a main concern now, nuclear waste has to be disposed: as of today, the simplest solution would be to put it underground, in caves where it would stay far apart from humanity for hundreds of thousands of years.
  • The main problem to implement nuclear energy on a large scale are the surging costs (that in the USA range approx. from 3000 to 6000 $/kWh) that are required to build reactors and the not-so-quick construction times (average is 11-12 years, with relevant exceptions)

So nuclear energy, although not being renewable (it depends on radioactive materials, which are a limited resource), is green and strongly effective, but suffers from high production costs and long construction times, apart from the problem of nuclear waste.

3. Small Modular Reactors

One potential solution to the problems that affect nuclear energy development are Small Modular Reactors (SMR) which are, as the name suggests, smaller implementations of the traditional power plants.

  • They are small and modular, so their modules can be pre-assembled in a factory and just combined into a reactors in loco, speeding up significantly the construction times and dramatically cutting the costs.
  • Their security is managed without complex systems: being small and not dealing with high quantities of energy, these reactors take advantage of naturally-occurring physical processes to safeguard the energy production
  • They have a good energy efficiency: even though they produce a third of the energy that generally a traditional reactor outputs, they can be coupled with renewable sources of energy to enhance their performances.

Despite the obvious advantages, lots SMRs are still in the designing phase, and there is not enough evidence to assess their nuclear waste production: a research by Standford and British Columbia University suggests indeed that they would produce (in proportion) more waste than traditional reactors, compared to an energy production which still does not surpass the 300 MW/reactor.

So this leads to our big question, but also conclusion:

4. Why are Big Tech turning nuclear for AI?

As we saw, nuclear energy is highly efficient and, with technological advancements such as SMRs, is becoming more and more feasible and scalable. Apart from the nuclear waste problem (which can still constitute a big issue on the long run), nuclear energy is clean and carbon-free, so it does not contribute to the climate crisis. All of these reasons make it the perfect candidate to “clean” AI while yielding more power for it, even though some key points still remain unclear:

  • Big techs are pushing to build nuclear power but their energy requirements are way larger than what could be provided by those SMRs only: Google alone, according to its own environmental report, consumed 24 TWh of electricity in 2024, which means 24 millions MWh. The SMRs could contribute for a very small part, which probably will be piped straight into GenAI data centers and facilities, but they alone won’t actually be able to satisfy the ever growing energy hunger of AI.
  • These projects, even though planned on the short term (most of them will be carried out before 2035-2040), will take time, but the AI boom is happening now and the surging demand will be a problem way before 2035-2040: what will the strategy of the big techs be for the time being?
  • Besides investments in nuclear energy, big techs will need to give their money also to clean energy facilities. What they’ve been doing up to now, tho, has been acquiring Renewable Energy Credits (RECs) as a workaround: arguing that getting an entirely clean and green stream of renewable energy is almost impossible, tech giants just give money to developers that ensure that they’ll use those investments to build new renewable energy infrastructures. Another widely used model are carbon credits (CCs), a financial instrument that allows a company to pay someone else to take action and reduce their carbon emissions. RECs and CCs combined are a cheap and easy way to claim environmental goals without actually having met them in practice: according to a review by MIT, this strategy is widely used (Google, Amazon, Meta and Salesforce are just some examples) and often brings to no/scarce actual results in lowering a company’s impact, despite the claims of carbon neutrality.
  • Electrical grids are becoming every day more stressed because of the needs for energy by data centers and computational facilities: how will they handle the incoming power that is being poured into them to feed the demand of AI?

So, in conclusion: are big techs really interested in the decarbonizing potential of nuclear energy, apart from its power efficiency, or are they just energy-hungry and trying to find some short-term cost effective solutions which will also allow them to green-wash their image? There is no easy answer, and maybe there is no answer at all, for now: only the future will tell us what side they took.

References

See the references for this article here

Read More

Is AI carbon footprint worrisome?

2024-07-13

“AI-powered robots are deployed by the farming industry to grow plants using less resources (Sheikh, 2020). AI is applied to tackle the protein structure prediction challenge, which can lead to revolutionary advances for biological sciences (Jumper et al., 2021). AI is also used to discover new electrocatalysts for efficient and scalable ways to store and use renewable energy (Zitnick et al., 2020) while also being applied to predict renewable energy availability in advance to improve energy utilization (Elkin & Witherspoon, 2019)” - From Wu et al., 2022

_config.yml

The juxtaposition (and contraposition) of the two sets of statements at the beginning of this article does not come without a precise intention: it wants to underline one of the biggest contrasts of AI, a paradox-like loophole in which a tool that can help us through the climate crisis may, in the future, be an active part of that same crisis.

Is AI really that environmentally-threatening? Is there anything we could do to improve this situation? Let’s break this down, one step at a time.

0. Before we start: a little bit of terminology

We need to introduce three main terms, that we’ll be using throughout the article and that will be a useful common ground to agree on:

  • Carbon footprint: according to Britannica, it is the “amount of carbon dioxide (CO2) emissions associated with all the activities of a person or other entity (e.g., building, corporation, country, etc.)”. This does not only mean how much fossil fuels one directly consumes (gasoline, plastics…), but also all the emissions necessary for transportation, heating, electricity in the process of production of goods and provision of services.
  • CO2e (equivalent CO2): the European Commission writes that it is “a metric measure used to compare the emissions from various greenhouse gases on the basis of their global-warming potential (GWP), by converting amounts of other gases to the equivalent amount of carbon dioxide with the same global warming potential”. This simply means that there are lots of other greenhouse gases (methane, clorofluorocarbons, nitric oxide…) which all have global warming potential: despite our emissions being mainly made up by CO2, they encompass also these other gases, and it is easier for us to express everything in terms of CO2. For example: methane has 25 times higher global warming power than CO2, which means that producing 1 kg of methane can be translated into producing 25 kg of CO2e.
  • Life cycle assessment (LCA): following European Environmental Agency glossary, LCA “is a process of evaluating the effects that a product has on the environment over the entire period of its life thereby increasing resource-use efficiency and decreasing liabilities”. We can use this technique to trace the impact of an object (or sometimes a service) from start to end, understanding the energetic consumptions associated with its production, use and disposal.

These three definitions come with a disclaimer (especially for the first and last one): not everybody in the scientific community believes they are correct, and there are several other possibilities to define these concepts. What is interesting to us in this article is to grasp an operative knowledge, that will allow the understanding of facts and figures about AI impact on the environment: we won’t, thus, dive into scientific terminological disputes.

1. AI impact on the environment: a troubled story

There is a great problem about AI carbon footprint: we know very little about it, and most of AI companies are not really transparent on those data.

Let’s, nevertheless, try to look at some estimates, following a paper (Sustainable AI: Environmental Implications, Challenges And Opportunities) coming out of the 5th MLSys Conference, held in Santa Clara in 2022. The main idea behind the proposed analysis is to follow AI consumptions end-to-end, from hardware production to usage to deployment, in what the authors define as a “holistic approach”:

  • Hardware production, usage, maintenance and recycling: this portion is based on a thorough LCA for processors and other hardware facilities: the conclusion seems to point to a 30/70% split between hardware (or embodied) and computational (or operational) carbon footprint.
  • Researching, experimenting and training: although researching and experimenting could take long times and relevant computational efforts, these two portions are not nearly as heavy as training in terms of carbon footprint. A model like GPT-3, which we deem as surpassed nowadays, required >600.000 kg of CO2e: considered that the average world carbon footprint per person is about 4000 kg/year, we can say that GPT-3 had as much impact as 150 people in one year. Moreover, you have to consider that there is not only “offline” training (the one done with historical data), but there’s also “online” training, the one that keeps models up-to-date with recently published content: this portion, for example, is particularly relevant to recommendations models such as Meta’s RM1-5.
  • Inference: Inference may be the most relevant portion in terms of carbon costs: as Philip Lewer (Untether AI) says, “models are built expressly for the purpose of inference, and thus run substantially more frequently in inference mode than training mode — in essence train once, run everywhere” (from this article). According to researchers from MIT and Northeastern University, “different estimates from NVIDIA and Amazon suggest that inference tasks account for 80% or more of AI computational demand” (MacDonald et al., 2022). Also for a model like RM1 at Meta inference almost doubles the carbon costs already produced by offline and online training.

2. Data craving: an energy problem

If all of these aspects account for a relevant portion of AI carbon footprint, there’s also another giant elephant in the room that we’ve been ignoring up to this point: data. While not directly linked to AI “hardware” lifecycle, they are a crucial part for building models: data volumes in the LLM field went from an order of 10^11 tokens for GPT-3 (2020-21) to surpassing 10^13 tokens for Llama 3 (2024). Epoch AI’s estimates tell us that we’re going to run out of human-generated data to train AI between 2026 and 2032.

Where do we put and how do we maintain all these data? The answer is data centers, which consumed 460 TWh of electric energy in 2022, accounting for 2% of World’s demand: according to the International Energy Agency, data centers have the potential to double their consumes by 2026, with AI and cryptocurrencies leading the increase.

But why do data centers require so much energy? This is not only to keep their supercomputers going 24/7, but it is prominently to avoid overheating: a good share of the energy is indeed absorbed by cooling systems (and this may not be only an electricity problem, but also a water one). As underlined by MacDonald et al. in their paper, energy expenses are high temperatures-sensitive, which means that, with global warming, cooling may require even more effort.

3. Can we do something? An outlook

Researchers have been exploring numerous solutions to the problem of AI carbon footprint: Google, for example, in 2022 proposed the 4Ms to reduce the carbon footprint of Machine Learning and Deep Learning:

  • Model: optimize model choice, preferring sparse over dense models, as they require less computational energy (3x to 10x reduction)
  • Machine: use specifically tailored hardwares (like TPUv4) to reduce losses and increase efficiency (2x to 5x optimization).
  • Mechanization: computing in the cloud and using cloud data centers instead of physical ones can contribute to the decrease of energy consumptions by 1.4x to 2x
  • Map optimization: choosing the right location to sustain your cloud can significantly improve your carbon footprint reduction contributing with another 5x to 10x.

Also LMSys 2022 paper highlighted a combination of techniques that they used to reach an overall 810x energy consumption reduction in relation to Meta CPU carbon costs baseline:

  • Platform-level caching: frequently accessed data and embedding are precomputed and cached inside a DRAM which makes them accessible in an easier way.
  • GPU usage: employing GPU acceleration can decrease energy costs up to 10x
  • Low precision data format: employing FP16 GPUs instead of FP32 ones proved more efficient
  • Algorithm optimization: choosing the right training and inference algorithms can decrease energy costs up to 5x

Still, questions remain: will all these procedures really help us decrease AI impact on the environment? Will AI itself prove more beneficial for climate crisis that it will be detrimental?

Beyond these questions and all the others that may be asked, what stands out clear from all these observations is that, along with questioning, we need to start taking action, requesting transparency and green policies from AI companies and starting building climate-awareness around our own AI use. And then, at the right time, all the answers we need will come.

References

  • Wu, C. J., Raghavendra, R., Gupta, U., Acun, B., Ardalani, N., Maeng, K., … & Hazelwood, K. (2022). Sustainable ai: Environmental implications, challenges and opportunities. Proceedings of Machine Learning and Systems, 4, 795-813.
  • McDonald, J., Li, B., Frey, N., Tiwari, D., Gadepally, V., & Samsi, S. (2022). Great power, great responsibility: Recommendations for reducing energy for training language models. arXiv preprint arXiv:2205.09646.
  • Cho R. (2023) AI growing carbon footprint, https://news.climate.columbia.edu/2023/06/09/ais-growing-carbon-footprint/
  • De Bolle M. (2024) AI’s carbon footprint appears likely to be alarming, https://www.piie.com/blogs/realtime-economics/2024/ais-carbon-footprint-appears-likely-be-alarming
  • Bainley B. (2022) AI Power Consumption Exploding, https://semiengineering.com/ai-power-consumption-exploding/
  • Heikkilä M. (2023) AI’s carbon footprint is bigger than you think https://www.technologyreview.com/2023/12/05/1084417/ais-carbon-footprint-is-bigger-than-you-think/
  • Patterson D. (2022) Good News About the Carbon Footprint of Machine Learning Training, https://research.google/blog/good-news-about-the-carbon-footprint-of-machine-learning-training/
  • Buckley S. (2024) IEA Study Sees AI, Cryptocurrency Doubling Data Center Energy Consumption by 2026, https://www.datacenterfrontier.com/energy/article/33038469/iea-study-sees-ai-cryptocurrency-doubling-data-center-energy-consumption-by-2026
Read More

Repetita iuvant: how to improve AI code generation

2024-07-07

Introduction: Codium-AI experiment

_config.yml

This image, taken from Codium-AI’s January paper (Ridnik et al., 2024) in which they introduced AlphaCodium, displays what most likely is the next-future of AI-centered code generation.

Understanding this kind of workflow is then critical not only to developers, but also to non-technical people who occasionally would need to do some coding: let’s break it down, as usual in a plain and simple way, so that (almost) everyone can understand!

0. The starting point

0a. The dataset

AlphaCodium (that’s the name of the workflow in the image) was conceived as a way to tackle complex programming, contained in CodeContest, a competitive coding dataset that encompasses a large number of problems representing all sort of reasoning challenges for LLMs.

The two great advantages of using CodeContest dataset are:

  1. Presence of public tests (sets of input values and results that developers can access during the competition too see how their code performs) and numerous private tests (accessible only to the evaluators). This is really important because private tests avoid “overfitting” issues, which means that they prevent LLMs from producing some code perfectly tailored on public tests to pass them, when in reality it doesn’t really work in a generalized way. To sum this up, private tests avoid false positives
  2. CodeContest problems are not just difficult to solve: they contain small details, subtleties that LLMs, caught up in their strive to generalize the question they are presented, do not usually notice.

0b. Competitor models

Other models or flows addressed the challenge of smoothing complex reasoning in code generation; the two explicitly mentioned in Codium-AI’s paper are:

  • AlphaCode by Google Deepmind was finetuned specifically on CodeContest: it produces millions of solutions, of which progressively smaller portions get selected based on how well they fit the problem representation. In the end, only 1-10 solutions are retained. Even though it had impressive results at the time, the computational burden makes this an unsuitable solution for everyday users.
  • CodeChain by Le et al. (2023) had the aim to enhance modular code generation capacity, to make the outputs more similar to the ones skilled developers would produce. This is achieved through a chain of self-revisions, guided by previously produced snippets.

Spoiler: neither of them proves as good as AlphaCodium on the reported benchmarks in the paper.

1. The flow

1a. Natural language reasoning

As you can see in the image at the beginning of this article, AlphaCodium’s workflow is divided in two portions. The first one encompasses thought processes in which mostly natural language is involved, hence we could call it the Natural Language Reasoning (NLR) phase.

  1. We start with a prompt that contains both the problem and the public tests
  2. We proceed to ask the LLM to “reason out loud” on the problem
  3. The same reasoning procedure goes for the public tests
  4. After having produced some thoughts on the problem, the model outputs a first batch of potential solutions
  5. The LLM is then asked to rank these solutions according to their suitability for problem and public tests
  6. To further test the model’s understanding of the starting problem, we ask it to produce other tests, which we will be using to evaluate the code solutions performances.

1b. Coding test iterations

The second portion includes actual code execution and evaluation with public and AI-generated tests:

  1. We make sure that the initial code solution works without bugs: if not, we regenerate it until we either reach a maximum iteration limit or produce an apparently zero-bug solution
  2. Public tests are then taken over by the model’s code: we search for the solution that maximizes passes over fails over several iteration rounds; this solution is passed over to the AI tests
  3. The last step is to test the code against AI-generated input/outputs: the solution that best fits them is returned as the final one, and will be evaluated with private tests.

This second portion may leave us with some questions, such as: what if the model did not understand the problem and produced wrong tests? How do we prevent the degeneration of code if there are corrupted AI-generated tests?

These questions will be addressed in the next section.

2. Performance-enhancing solutions

2a. Generation-oriented workarounds

The first target that Codium-AI scientists worked on was the generation of natural language reasoning and the production of coding solutions:

  • They made the model reason in a concise but effective way, explicitly asking it to structure its thoughts in bullet points: this strategy proved to improve the quality of the output when the LLM was asked to reason about issues
  • AI was asked to generate outputs in YAML format, which is easier to generate and parse than JSON format, enabling also to eliminate all the hassle of prompt engineering and allowing to solve advanced problems
  • Direct questions and one-block solutions are postponed, to the advantage of reasoning and exploration. Putting “pressure” on the model to find the best solution often leads to hallucinations and make the LLM go down the rabbit hole without coming back.

2b. Code-oriented workarounds

The questions at the end of section 1 represent important issues for AlphaCodium, which can significantly deteriorate its performance - but the authors of the paper found solutions to them:

  • Soft decisions and self-validation to tackle wrong AI-generated tests: instead of asking the model to evaluate its tests with a “Yes”/”No”, trenchant answer, we make it reason about the correctness of its tests, code and outputs altogether. This leads to “soft decisions”, which make the model adjust its tests.
  • Anchor tests avoid code degeneration: imagine that AI tests are wrong even after revisions, then the code solution might be right but still not pass the LLM-generated tests. In this sense, the model would go on and modify its code, making it inevitably unfit for the real solution: to avoid this deterioration, AlphaCodium identifies “anchor tests”, i.e. public tests that the code passed and that it should pass also after AI-tests iterations, to be retained as a solution.

3. Results

When LLMs were directly asked to generate code from the problem (direct prompt approach), AlphaCodium-enhanced open- (DeepSeek-33B) and closed-source (GPT3.5 and GPT4) models outperformed their base counterpart, with a 2.3x improvements in GPT4 performance (from 19 to 44%) as an highlight.

The comparison with AlphaCode and CodeChain was instead made with a pass@k metric (which means the percentage of test passing with k generated solution): AlphaCodium’s pass@5 with both GPT3.5 and GPT4 was higher than AlphaCode’s pass@1k@10 (1000 starting solutions and 10 selected final ones) and pass@10k@10, especially in the validation phase. CodeChain’s pass@5 with GPT3.5 was also lower than AlphaCodium’s results.

In general, this self-corrective and self-reasoning approach seems to yield better performances than the models by themselves or other complex workflows.

Conclusion: what are we gonna do with all this future?

AlphaCodium’s workflow represent a reliable and solid way to enhance models performances in code generation, exploiting a powerful combination of NLR and corrective iterations.

This flow is simple to understand, involves 4 orders of magnitude less LLM calls than AlphaCode and can provide a fast and trustable solution even to non-professional coders.

The question that remains is: what are we gonna do with all this future? Are we going to invest in more and more data and training to build better coding models? Will we rely on fine-tuning or monosemanticy properties of LLMs to enhance their performances on certain downstream tasks? Or are we going to develop better and better workflows to improve base, non-finetuned models?

There’s no simple answer: we’ll see what the future will bring to us (or, maybe, what we will bring to the future).

References

  • Ridnik T, Kredo D, Friedman I and Codium AI Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering. ArXiv (2024). https://doi.org/10.48550/arXiv.2401.08500
  • GitHub repository
Read More

BrAIn: next generation neurons?

2024-06-04

A game-changing publication?

_config.yml

11th December 2023 may seem a normal day to most people, and if you are one of them, prepare to be surprised: a breakthrough publication was indeed issued on Nature Electronic, and it happened to put the foundations for a field that will be crucial in the next future - the title? “Brain organoid reservoir computing for artificial intelligence”.

I bet that everyone can somewhat grasp the idea behind the title, but it is worth introducing some key concepts for those who may be unfamiliar with the biological notions behind the brain.

Biological concepts

  • Neuron: Neurons are the building blocks of a brain. They are small cells with a round central body called soma, where all the biological activity is carried out, some input branched-like structures (the dendrites), that receive signals from neighbor neurons, and an output wire-like axon, along which the bioelectric signal is conducted.
  • Synapsis: As the Greek etymology underlines, synapses are “points of contact”: the end of the axon is indeed enlarged into a “synaptic button”, from which neurotransmitters are released following a bioelectrical signal. On the other end, there is a dendrite which, despite not touching the synaptic button, is really close to it, separated only but what is called the “synaptic slit”: it receives the neurotransmitter, evoking a bioelectrical response
  • Action potential: Neurons transmit their bioelectric signals through an all-or-nothing event, known as action potential. The action potential originates in the point where the axon emerges from the soma, known as axon hillock, because of a series of ionic exchanges across the neural membrane. When the axon hillock reaches a voltage threshold, the neuron fires, and the shape of the electric signal curve is always the same: what determines the differences among sensations, emotions and memories is the frequency of the firing. The action potential is transmitted along the axon, and the regions which were surpassed by the signal become refractory to other stimulation for a short time, ensuring a one-way transmission of the action potential (from axon hillock to dendrites).
  • Synaptic plasticity: this phenomenon encompasses modifications of the quantity of released neurotransmitters, number of receptors… which are performed on a synapsis to reinforce or weaken it, based on how much and how well it is used.
  • Neuroplasticity: Neuroplasticity is the phenomenon through which brain neurons rearrange to optimize the way they respond to external stimuli.
  • Organoid: An organoid is an arrangement of living cells into complex structures which mimic the functioning of an organ. They are used for simulations and experiments.

Explaining the breakthrough

Now that we have all the concepts we need, let’s dive into understanding what happened in the paper we mentioned in the first paragraph, and in general what is going on in the field.

1. The core: organoid intelligence

Organoid Intelligence (OI) is a dynamic and growing field in bio-computing, whose base idea is to harness the power of human neurons, arranged into brain organoids, to speed up computing, ease training and provide a cheap and reliable alternative to artificial neural networks, to run AI algorithms and perform tasks. The ultimate aim of this field is to build a “wet-ware” (as opposed to the already-existent hardware), a concept that the mentioned paper describes as brainoware, with which we’ll be able to implement brain-machine interfaces and dramatically increase our power.

2. The findings: speech recognition and comparison with ANNs

In “Brain organoid reservoir computing for artificial intelligence”, the team behind the paper built a small brain organoid, loaded onto a multielectrode array (MAE) chip.

The organoid was trained to recognized the speech of 8 people with 240 recordings, showing different neural patterns of activation when different people were speaking and achieving a 78% accuracy in recognizing them. This may sound pretty unsurprising, unless you consider the size of the training dataset: 240 recordings are a really small-sized set of data, considered that AI algorithms would need thousands of examples to achieve similar accuracy scores.

After that, some other tests were performed, but one was really important, because it encompassed the comparison among the ONN (organoid neural netowrk), ANNs with a Long-Short Term Memory (LSTM) unit and ANNs without it. Brain cells were trained through impulses for four days (four ‘epochs’) on solving a 200-data point map. ONN outperformed ANNs without LSTMs, while ANNs+LSTM only were able to prove a little bit more accurate than the organoid only with 50 epochs of training, which means that ONNs yield similar results to their artificial counterparts with >90% less training.

3. Advantages and limitations

There are big advantages linked to OI:

  • Train for less time and with less data, obtain high accuracy
  • We can use it to explain how brain works and to take a look into brain degenerative diseases like Alzheimer’s disease
  • High volumes of managed and output data

Despite the promising perspectives, there are still some obstacles we need to overcome:

  • Current organoids are scarcely persistent, we need something more durable and reliable
  • We need to adapt machine-brain interfaces to smoother and more biological-friendly structure, in order to seamlessly connect and tune brain input/outputs with external machines.
  • We have to scale up our algorithms and models to handle the huge volume of data that organoids will be able to manage

Conclusion

Organoid Intelligence is undoubtedly the forefront of biocomputing, which will be able to revolutionize the way we understand and (probably) even think of our brain, unlocking novel and unexpected discoveries on how we learn notions and shape our memory. On the other hand, it will provide a powerful hardware, which will capture huge conceptual, computational and representational power in small brain-like engines, reducing learning times and expenses for our new AI models. All of this, obviously, is subjected to the condition that we invest resources and time in building new organoids, algorithms and data facilities: the future of brAIn is close, we just need to put some effort to reach it.

References

  • Cai, H., Ao, Z., Tian, C. et al. Brain organoid reservoir computing for artificial intelligence. Nat Electron 6, 1032–1039 (2023). https://doi.org/10.1038/s41928-023-01069-w
  • Tozer L, ‘Biocomputer’ combines lab-grown brain tissue with electronic hardware, https://www.nature.com/articles/d41586-023-03975-7
  • Smirnova L, Caffo BS, Gracias DH, Huang Q, Morales Pantoja IE, Tang B, Zack DJ, Berlinicke CA, Boyd JL, Harris TD, Johnson EC, Kagan BJ, Kahn J, Muotri AR, Paulhamus BL, Schwamborn JC, Plotkin J, Szalay AS, Vogelstein JT, Worley PF and Hartung T. Organoid intelligence (OI): the new frontier in biocomputing and intelligence-in-a-dish. Front Sci (2023) 1:1017235. doi: 10.3389/fsci.2023.1017235
Read More

What is going on with AlphaFold3?

2024-05-18

A revolution in the field of Protein Science?

_config.yml

On 8th May 2024, Google Deepmind and Isomorphic Labs introduced the world to their new tool for protein structure prediction, AlphaFold3, a more powerful version of the already existent AlphaFold2, with which Google Deepmind had already reconstructed more than 200 millions protein structures (almost every known protein) and crashed the a priori protein structure prediction challenge that had been chasing Bioinformaticians for decades (I talked about it in more detail here).

Are we on the verge of another revolution? Is AlphaFold3 really a game changer as its predecessor was? In this blog post, we’ll explore the potential breakthroughs and new applications, as well as some limitations that the authors themselves recognized.

What’s new?

If you read the abstract of the paper accepted by Nature and published, open-access, on their website, you will see some interesting news:

The introduction of AlphaFold 2 has spurred a revolution in modelling the structure of proteins and their interactions, enabling a huge range of applications in protein modelling and design. In this paper, we describe our AlphaFold 3 model with a substantially updated diffusion-based architecture, which is capable of joint structure prediction of complexes including proteins, nucleic acids, small molecules, ions, and modified residues. The new AlphaFold model demonstrates significantly improved accuracy over many previous specialised tools: far greater accuracy on protein-ligand interactions than state of the art docking tools, much higher accuracy on protein-nucleic acid interactions than nucleic-acid-specific predictors, and significantly higher antibody-antigen prediction accuracy than AlphaFold-Multimer v2.3. Together these results show that high accuracy modelling across biomolecular space is possible within a single unified deep learning framework.

Let’s break this down, so that Biologists can understand AI concepts and AI Scientists can understand Biology ones:

0. Let’s introduce some terminology

0a. For the Biologists

  • Machine Learning: Machine Learning is the process with which computers learn to abstract from some data they have not based on human-made instructions, but on advanced statistical and mathematical models
  • Deep Learning: Deep Learning is a Machine Learning framework which is prominently designed on Neural Networks and uses a brain-like architecture to learn.
  • Neural Network: A Neural Network is somewhat like a network of neurons in the brain, even though much more simpler: in this sense, there are several checkpoints (the neurons), connected with one another, that receive and pass the information if they reach an activation threshold, exactly as it happens with the action potential of a real neural cell.

0b. For the AI Scientists

  • Protein: Proteins are biomolecules of varying size, made up by little building blocks known as amino acids. They are the factotum of a cell: if you are to imagine a cell as a city, proteins actually represent the transportation system, the communication web, the police, the factory workers… A protein has a primary (flat chain), secondary (mostly 3D but sparse) and tertiary (3D and ordered) structure.
  • Ligand: A ligand is something that binds something else: in the context of proteins, it can be a neuro-hormonal signal (like adrenaline) that binds its receptor.
  • Nucleic Acids: Nucleic acids (DNA and RNA) are the biomolecules that contain the information about the living system: they are written in a universal language, defined by their building blocks (the nucleotides), and they can be translated into proteins. Thinking of the city example we made before, they could be represented as the Administration Service of it. Nucleic acids often interact with proteins.

1. The diffusion architecture

For diffusion we mean that application of generative AI that is able to create images from a text prompt. The idea behind diffusion is perfectly suitable for the problem of protein structure prediction, as it is a text-based task: indeed, even though the 3D structure of a protein could seem completely unrelated to its 1D amino-acidic chain, there is actually a stronger link than anyone could think of. At the end of the day, indeed, all of the 3D interactions among amino-acids are already defined by their order in the primary chain.

The diffusion architecture in AlphaFold3 receives raw atom coordinates, meaning that, after the first prediction steps coming from a set of neural networks blocks (similar but not equal to those of AlphaFold2), the model is able to turn a “fuzzy” image, with lots of positional and stereochemical noise, to a well-defined and sharp structure. The big advantage of the diffusion model is that it is able to predict the local structure even if the upstream network is not sure about the correct amino-acidic coordinates: this is achieved thanks to the generative process, which is able to produce a distribution of answers that capture most of the possible variability in the protein structure.

As every generative model, also AlphaFold3 diffusion one is prone to hallucination: this is particularly true when it comes to unstructured regions of a protein (that lack a defined and stable tertiary structure), and the AlphaFold3 diffusion blocks are trained in such a way that, in those regions, they produce randomly coiled chains of amino-acids, as done by AlphaFold-Multimeter v2.3 (which generated the images used for hallucination correction training).

2. New tasks and better accuracy

As reported in the abstract, AlphaFold now outperforms task-specific softwares for:

  • Protein-Nucleic Acid interaction
  • Protein-Ligand interaction
  • Antigen-Antibody interaction

Why are these three tasks so important to us?

  • Proteins commonly interact with DNA and RNA: as reported by Cozzolino et al. (2021), these interactions “affect fundamental processes such as replication, transcription, and repair in the case of DNA, as well as transport, translation, splicing, and silencing in the case of RNA”. All of these are key cellular functionalities, that, if disrupted, can cause serious diseases. Moreover, understanding how proteins bind DNA and RNA can be really useful in genome editing (CRISPR-Cas9 is actually an RNA-protein-DNA system) and in the fight against bacteria and anti-microbial resistance (lots of the antimicrobial resistance depends on protein-DNA interaction that activates a specific gene which makes the bacterium resistant to the antibiotic).
  • Protein-Ligand interaction is key in drug design: up to now, we used “docking”, which means that we simulated the interactions between certain molecule types and proteins by re-iterating those interactions with slightly different chemical structures and positions. Needless to say, this is time-consuming and computationally intense, and AlphaFold3 can definitely improve these aspects, while also retaining a higher accuracy.
  • Antigen-Antibody interaction is the process with which some protein produced by our immune system (antibody) bind with foreign or mutated, potentially harmful, molecules: it is one of the methods with which pathogens can be found and eliminated. Predicting these interactions is key in understanding the immune system responses to certain pathogens, but also to something we want to introduce in our body in order to cure it. It also plays an incredibly important role in tumoral cell recognition, as tumoral cell may have some slight modifications of their cell-specific antigen that is not recognized as a threat by our immune system, but can be identified (and thus potentially treated) thanks to computational means.

What are the limitations?

As the authors of the paper reported, they are aware of three big limitations:

  1. Difficulties in predicting chirality: it is an intrinsic property of a molecule that deals with how the molecule rotates polarized light. Two molecules that differ for nothing but chirality are like your hands: they are perfectly similar, but you can’t superpose them palm to back. Even though some chirality penalty has been introduced, the model still produces about 4% of chirality violating proteins.
  2. Clashing atoms: there is a tendency, especially with >100 nucleotides nucleic acids interacting with >2000 amino acids proteins, to overlap atoms in the same space region (which is not actually possible).
  3. Hallucinations, as discussed before, can still happen, so an intrinsic ranking system has been introduced to help the model trashing the hallucinated structures.
  4. There are still some tasks, such as Antigen-Antibody prediction, where AlphaFold3 can improve. The authors observed improvements when the diffusion models is given more seeds (up to 1000), i.e. a series of numbers that “instruct” the model on how to generate an image, whereas no substantial advancement with more stable diffusion samples.
  5. As for all protein-prediction models, proteins are predicted in their “static” form, and not “in action”, when they are dynamically inserted into a living cell.

Conclusion and open questions

AlphaFold3 definitely represents a breakthrough in Protein Sciences: still, we are not at an arrival point.

This model marks the kick-off of the new generative AI approach to complex biological problems, which we also saw with OpenCRISPR: on one hand, this holds incredible potential but, on the other, the risk is that we are going to decrease the explainability of our models, leaving scientist with some auto-generated accuracy metrics that are not necessarily able to tell them why a protein has a certain structure.

Another really important topic is that AlphaFold3 is not completely open-source: there is an online-server provided by Google, but the code, as stated in the paper, is not given (except for some mock code that simulates the architecture). This poses a big ethical question: are we sure that we want a world were the access to advanced scientific tools is protected by strict licenses and not everyone can see what is actually going on in softwares by accessing their code?

And, more importantly now than ever, we must ask ourselves: are we really going to rely on non fully open-source AI to design our drugs, deliver targeted genome editing and cure diseases?

References

  • Abramson, J., Adler, J., Dunger, J. et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature (2024). https://doi.org/10.1038/s41586-024-07487-w
  • Cozzolino F, Iacobucci I, Monaco V, Monti M. Protein-DNA/RNA Interactions: An Overview of Investigation Methods in the -Omics Era. J Proteome Res. 2021;20(6):3018-3030. doi:10.1021/acs.jproteome.1c00074
Read More