Yoneda Predict
Never run a reaction that fails again.
Why Coupling Reactions
Coupling reactions, specifically Suzuki-Miyaura cross-coupling, Buchwald-Hartwig cross-coupling and amide coupling are versatile tools for C-C and C-N bond formation, popular in drug production, medicinal chemistry, and many other fields. However, most couplings are notoriously finnicky, which makes them the most common reaction types to be optimized using HTE. [1]
But what if you knew good reaction conditions right away without any wet lab experimentation?
Our Solution
We are curating a database of coupling reactions to train a powerful AI model that will be able to predict reaction conditions for any of the selected couplings.
No guessing - just paste in the structure of your substrates and the model suggests conditions that are guaranteed to be within 20 % of the best possible yield.
However, it is not possible to train this model on published data - the reactions are biased by chemists' favorite reagents, results vary across different labs, and some yields are not reported accurately.
That's why we will use HTE to run thousands of reactions, ensuring to efficiently cover the entire reaction space. During training, we will generate more data for regions where the model struggles to get accurate predictions even for difficult substrates.
Reaction Space Size
It is impossible to test every possible substrate. But by focusing on different chemical features, we can quantify the practical size of the chemical space.
By investigating over 450k reactions, we found motives that cover most of the published S-M, B-H and amide coupling reactions. The analysis for Suzuki-Miyaura reactions is shown below.
Training Dataset Size
The size of dataset that would be needed to train this model was estimated using full-factorial datasets published in the literature.
The graph on the right shows how many training points are needed so that the model can predict good reaction conditions for each combination of substrates. We defined good reaction conditions as being at most 20 % from the best possible yield.
Using linear relationship between logarithmic size of the reaction space and number of required training points, the design spaces for the selected couplings require 17.9k ± 4.3k training points.