Scientists have successfully integrated latest AI tools (e.g. GPT-4) with automation to develop ‘systems’ capable of autonomously designing, planning and performing complex chemical experiments. ‘Coscientist’ and ‘ChemCrow’ are two such AI-based systems developed recently that display emergent capabilities. Driven by GPT-4 (the latest version of generative AI of OpenAI), Coscientist demonstrated advanced reasoning and experimental design capabilities. ChemCrow effectively automated a set of tasks and executed discovery and synthesis of chemical agents. ‘Coscientist’ and ‘ChemCrow’ offer a new way of conducting research synergistically in partnership with machines and can come handy in executing experimental tasks in automated robotic laboratories.
Generative AI Is about creation or generation of new contents by a computer programme. Google Translate that came into being 17 years ago in 2007 is an example of generative artificial intelligence (AI). It generates translations (output) from a give language (input). OpenAI’s ChatGPT , Microsoft’s Copilot, Google Bard, Meta (formerly Facebook) ’s Llama , Elon Musk’s Grok etc are some of important AI tools currently available.
ChatGPT, launched last year on 30 November 2022, has become very popular. It is said to have acquired 1 million users within 5 days and 100 million monthly users within two months. ChatGPT is based on a large language model (LLM). The key principle is language modelling i.e. pre-training the model with the data so that the model predicts what comes next in the sentences when prompted. A language model (LM) thus makes a probabilistic prediction of the next word in a natural language given preceding one(s). When based of neural network, it is called ‘neural network language model’ in which case data is processed in a way like in the human brain. A large language model (LLM) is a large-scale model that can perform a variety of natural language processing tasks for general-purpose language understanding and generation. Transformer is neural network architecture used to build ChatGPT. The name ‘GPT’ is acronym for ‘Generative pre-trained Transformer’. OpenAI uses transformer-based large language models.
GPT-4, ChatGPT’s fourth version, was released on 13 March 2023. Unlike earlier versions which accept only text inputs, GPT-4 accepts both image and texts inputs (hence the prefix Chat is not used for fourth version). It is a large multimodal model. GPT-4 Turbo, launched on 06 November 2023, is an improved and more powerful version of GPT-4.
Coscientist is made up of five interacting modules: planner, web searcher, code execution, documentation and automation. These modules exchange messages with each other for web and documentation search, code execution and performance of experiments. The interaction is through four commands – ‘GOOGLE’, ‘PYTHON’, ‘DOCUMENTATION’ and ‘EXPERIMENT’.
The planner module is the main module. It is driven by GPT-4 and is tasked with planning. Based on simple pain text prompt from the user, the planner issues necessary commands to other modules to collect knowledge. The web searcher module which also is a LLM is invoked by the GOOGLE command to search internet and related sub-actions for effective planning. The code execution module performs code execution through PYTHON command. This module does not use any LLM. Documentation module acts through DOCUMENTATION command to retrieve and summarise necessary documentation. Based on this, the planner module invokes EXPERIMENT command to the automation module for performance of experiments.
On appropriate prompt, Coscientist synthesised painkillers paracetamol and aspirin and the organic molecules nitroaniline and phenolphthalein and many other known molecules correctly. The planner module could optimise reactions for the best reaction yields.
In another study, an LLM chemistry agent ChemCrow autonomously planned and synthesised an insect repellent, three organocatalysts, and guided the discovery of a novel chromophore. ChemCrow was effective in automating diverse chemical tasks.
The two non-organic, artificial intelligent systems, Coscientists and ChemCrow display the emergent capabilities of autonomous planning and executing chemical tasks for synthesis of known molecules and discovery of novel molecules. They have advanced reasoning, problem solving and experimental design capabilities which can come handy in chemical research.
Such AI agent systems can be utilised by non-experts for executing routine tasks in chemistry thus reducing cost and efforts. They also have potential to fasten discovery of new molecules
***
References:
- Boiko, D.A., et al 2023. Autonomous chemical research with large language models. Nature 624, 570–578. Published: 20 December 2023. DOI: https://doi.org/10.1038/s41586-023-06792-0
- Carnegie Mellon University 2023 News – CMU-Designed Artificially Intelligent Coscientist Automates Scientific Discovery. Posted 20 December 2023. Available at https://www.cmu.edu/news/stories/archives/2023/december/cmu-designed-artificially-intelligent-coscientist-automates-scientific-discovery
- Bran AM, et al 2023. ChemCrow: Augmenting large-language models with chemistry tools. arXiv:2304.05376v5. DOI: https://doi.org/10.48550/arXiv.2304.05376
***
Introductory lectures on AI:
***