REINVENT is a molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design, molecule optimization, and other small molecule design tasks. REINVENT uses a Reinforcement Learning (RL) algorithm to generate optimized molecules compliant with a user defined property profile defined as a multi-component score. Transfer Learning (TL) can be used to create or pre-train a model that generates molecules closer to a set of input molecules.
A paper describing the software has been published as Open Access in the Journal of Cheminformatics: Reinvent 4: Modern AI–driven generative molecule design. See AUTHORS.md for references to previous papers.
REINVENT is being developed on Linux and supports both GPU and CPU. The Linux version is fully validated. REINVENT on Windows and MacOSX support GPU and CPU, but Windows is less well tested and therefore support limited.
The code is written in Python 3 (>= 3.11). The list of dependencies can be found in the repository (see also Installation below).
A GPU is not strictly necessary but strongly recommended for performance reasons especially for transfer learning and model training. For Reinforcement learning (RL) a GPU is less important because most scoring components run on the CPU.
Note that if no GPU is installed in your computer the code will run on the CPU automatically. REINVENT supports, as of this writing, NVIDIA GPUs, some AMD GPUs, Intel ARC, and newer Apple GPUs. For many design tasks a memory of about 8 GiB for both CPU main memory and GPU memory is sufficient.
Using conda
- Clone this Git repository. Add
--depth 1for only the newest version as the repository has grown quite large over time.git clone git@github.com:MolecularAI/REINVENT4.git # --depth 1 - Create a Python environment and install a compatible version of Python, for example with Conda or other virtual environments.
conda create --name reinvent4 python=3.10 conda activate reinvent4
- Change directory to the repository to install. You will need to set the right processor type, see PyTorch versions. Linux supports CUDA (e.g. "cu126"), AMD ROCm (e.g. "rocm6.4"), Intel XPU ("xpu") and CPU. Windows supports CUDA, XPU and CPU. Newer Apple chips e.g. M5 are supported by PyTorch's MPS backend (use "mac" as processor type). Optionally, you can select dependencies "openeye" (for ROCS; you need to obtain your own license), "chemprop1" for Chemprop v1, "isim" for similarity tracking in TensorBoard or "none" to skip all. The default is installation of "all" dependencies. See the help text from the install script for details.
python install.py --help # install all packages including chemprop2 for CUDA 12.8 python install.py cu128 # or rocm6.4, xpu, mac, cpu, etc. # if you still want Chemprop v1 (but check https://chemprop.readthedocs.io/en/main/convert_v1_to_v2.html) # python install.py -e cu128 -d all chemprop1 # install all packages with Chemprop v1
- Test the tool. The installer has added a script
reinventto your PATH.reinvent --help
Using uv (experimental)
uv is a fast Python package manager that handles virtual environments and dependencies in one step.
- Clone this Git repository. Add
--depth 1for only the newest version as the repository has grown quite large over time.git clone git@github.com:MolecularAI/REINVENT4.git # --depth 1 - Change directory to the repository and run
uv sync. The PyTorch index for CUDA 12.8 is pre-configured inpyproject.toml.cd REINVENT4 uv sync # core dependencies uv sync --extra isim # + iSIM similarity tracking in TensorBoard uv sync --extra all # + OpenEye ROCS (requires a license)
- Test the tool.
uv run reinvent --help
All public prior models can be found on Zenodo.
REINVENT is a command line tool and works principally as follows
reinvent -l sampling.log sampling.tomlThis writes logging information to the file sampling.log. If you wish to write
this to the screen, leave out the -l sampling.log part. sampling.toml is the
configuration file. The main format is TOML as it tends to be more user friendly. JSON and YAML are supported too.
Sample TOML configuration files for all run modes are located in configs/ in
the repository. File paths in these files need to be adjusted to your local
installation. You will need to choose a model and the appropriate run mode
depending on the research problem you are trying to address. There is
additional documentation in configs/ in several *.md files with
instructions on how to configure the TOML file. Internal priors can be
referenced with a dot notation (see reinvent/prior_registry.py).
Run reinvent --help for a full list of options.
| Flag | Description | Default |
|---|---|---|
FILE |
Input configuration file (positional) | — |
-f, --config-format |
Force config file format: toml, json, yaml | toml |
-d, --device |
Torch device: cuda, cpu. Overwrites config file setting | — |
-l, --log-filename |
Write log to file instead of stderr | stderr |
--log-level |
Log verbosity level (see below) | info |
-s, --seed |
Random seed for reproducibility | — |
--dotenv-filename |
Dotenv file for scoring component environment setup | — |
--enable-rdkit-log-levels |
Enable RDKit log levels: all, error, warning, info, debug | — |
-V, --version |
Print version and exit | — |
Log levels (from most to least verbose): verbose, debug, info, warning, error, critical.
verbose— highest detail, includes per-SMILES/state output after sampling and full JSON payloads. Use for deep debugging.debug— standard diagnostic output: configuration details, internal state summaries.info— normal operation: progress, milestones, results (default).
Basic instructions can be found in the comments in the config examples in
configs/.
Notebooks are provided in the notebooks/ directory and contributed notebooks
and tutorials in contrib/. Please note that we provide the notebooks in
jupytext "light script" format. To work with the light scripts you will need
to install jupytext. A few other packages will come in handy too.
pip install jupytext mols2grid seabornThe Python files in notebooks/ can then be converted to a notebook e.g.
jupytext -o Reinvent_demo.ipynb Reinvent_demo.pyThe scoring subsystem uses a simple plugin mechanism (Python native namespace packages). If you wish to write your own plugin, follow the instructions below. There is no need to touch any of the REINVENT code. The public repository contains a contrib directory with some useful examples.
- Create
/top/dir/somewhere/reinvent\_plugins/componentswhere/top/dir/somewhereis a convenient location for you. - Do not place a
__init__.pyin eitherreinvent_pluginsorcomponentsas this would break the mechanism. It is fine to create normal packages withincomponentsas long as you import those correctly. - Place a file whose name starts with
comp_*intoreinvent_plugins/componentsor subdirectories. Files with different names will be ignored i.e. not imported. The directory will be searched recursively so structure your code as needed but directory/package names must be unique. - Tag the scoring component class(es) in that file with the @add_tag decorator. More than one component class can be added to the same comp_ file. See existing code.
- Tag at most one dataclass for parameters in the same file, see existing code. This is optional.
- Set or add
/top/dir/somewhereto thePYTHONPATHenvironment variable or use any other mechanism to extendsys.path. - The scoring component should now automatically be picked up by REINVENT.
Ensure that the component can be important. The log file will write out an error if not. Check directly if import is possible:
from reinvent_plugins.components import comp_myscorerThis is primarily for developers and admins/users who wish to ensure that the
installation works. The information here is not relevant to the practical use
of REINVENT. Please refer to Basic Usage for instructions on how to use the
reinvent command.
The REINVENT project uses the pytest framework for its tests. Before you run
them you first have to create a configuration file for the tests.
In the project directory, create a config.json file in the configs/ directory.
You can use the example config example.config.json as a base. Make sure that
you set MAIN_TEST_PATH to a non-existent directory. That is where temporary
files will be written during the tests. If it is set to an existing directory,
that directory will be removed once the tests have finished.
Some tests require a proprietary OpenEye license. You have to set up a few
things to make the tests read your license. The simple way is to just set the
OE_LICENSE environment variable to the path of the file containing the
license.
Once you have a configuration and your license can be read, you can run the tests.
$ pytest tests --json /path/to/config.json --device cuda