top of page

Yoneda Predict

Never run a reaction that fails again.

DALL·E 2024-02-28 16.03.27 - Create an abstract logo for the phrase _Cross-Coupling Reacti

Why Coupling Reactions

Coupling reactions, specifically Suzuki-Miyaura cross-coupling, Buchwald-Hartwig cross-coupling and amide coupling are versatile tools for C-C and C-N bond formation, popular in drug production, medicinal chemistry, and many other fields. However, most couplings are notoriously finnicky, which makes them the most common reaction types to be optimized using HTE. [1]

But what if you knew good reaction conditions right away without any wet lab experimentation?

DALL·E 2024-02-28 16.05.53 - Create an abstract logo that encapsulates the idea of innovat

Our Solution

We are curating a database of coupling reactions to train a powerful AI model that will be able to predict reaction conditions for any of the selected couplings.

No guessing - just paste in the structure of your substrates and the model suggests conditions that are guaranteed to be within 20 % of the best possible yield.

However, it is not possible to train this model on published data - the reactions are biased by chemists' favorite reagents, results vary across different labs, and some yields are not reported accurately.

That's why we will use HTE to run thousands of reactions, ensuring to efficiently cover the entire reaction space. During training, we will generate more data for regions where the model struggles to get accurate predictions even for difficult substrates.

DALL·E 2024-02-28 16.07.39 - Create an abstract logo that conveys the concept of expansive

Reaction Space Size

It is impossible to test every possible substrate. But by focusing on different chemical features, we can quantify the practical size of the chemical space.

By investigating over 450k reactions, we found motives that cover most of the published S-M, B-H and amide coupling reactions. The analysis for Suzuki-Miyaura reactions is shown below.

Training Dataset Size

The size of dataset that would be needed to train this model was estimated using full-factorial datasets published in the literature.

The graph on the right shows how many training points are needed so that the model can predict good reaction conditions for each combination of substrates. We defined good reaction conditions as being at most 20 % from the best possible yield.

Using linear relationship between logarithmic size of the reaction space and number of required training points, the design spaces for the selected couplings require 17.9k ± 4.3k training points.

Interested in our Model?

Thanks for submitting!

bottom of page