Artificial intelligence can play a key role in drug discovery, especially artificial neural networks, such as deep neural networks or recurrent networks, driving the development of this field. Many applications in property or activity prediction, such as physical chemistry and ADMET properties, quantitative structure-property relationship (QSPR) or quantitative structure-activity relationship (QSAR), support this application. Artificial intelligence promotes the development of biologically active molecules toward the desired characteristics. Combined with the synthesis plan and the feasibility of easy synthesis, the possibility of computers automatically discovering drugs is increasing.
High-quality input data is very important for the decision-making of predictive models. Without proper data and understanding of the scope of data application, no matter how good a model method is, it is difficult to output useful results. When considering whether the data is suitable for modeling, the most important thing we need to pay attention to is whether the data comes from the terminal. If not, the model may be wrong. There are different levels of uncertainty within and between data. Some data are related because they originate from the same problem. This requires humans to make reasonable annotations, but such a lengthy process is often ignored. Although to a certain extent we can use automatic labeling instead, AI can only provide limited help by inferring context and automatically detecting inconsistent labeling errors. Another challenge of labeling is the constant change and inconsistency of the understanding of biological issues, that is, the lack of a coherent knowledge system.
Despite the continuous development of high-throughput technology and synthetic chemistry, the chemistry space we search only occupies a small part. Cutting-edge methods such as DNA coding libraries can only test about 10^7-10^10 molecules. According to the different restriction conditions, we estimate that the chemical space of drug-like compounds is about 10^18-10^200 molecules. We have no way to exhaust all molecules, so the core question of drug design is "what should we do next." Medicinal chemists design based on their experience and the feasibility of synthesis, but based on the complexity of human diseases and the many difficulties faced in medicinal chemistry, if there is a method that can put forward new hypotheses, this will be useful for drug design.
Drug discovery needs to balance multiple indicators in the design process including target potential, selectivity, clearance rate, and permeability. However, optimizing one of the properties may reduce the other properties. Such a conflicting problem can be solved by the calculation framework of Multi-Objective Optimization (MOO). To carry out MOO, we must first train a predictive model for each chemical property, and then try to solve the optimization problem through the MOO algorithm, that is, find one or a series of molecules with better comprehensive properties. Since these properties usually conflict with each other, our goal is to generate a series of potential lead molecules, each of which has a trade-off between different properties, but they have been optimized to the best structure in a certain property. Such a set of solutions can be considered as the optimal boundary, and moving along the boundary will produce multiple optimal solutions. The current challenge for MOO is to construct them in reverse to some extent to find the chemical properties corresponding to the optimal activity (reverse QSAR). Similar to the de novo molecular design mentioned above, generative AI models may be suitable for solving such problems.
In the drug discovery phase, discovering and optimizing a chemical molecule requires a lot of time and investment, and the risk is extremely high. Therefore, R&D personnel continue to invest in improving the ability to detect compounds. This generates a lot of data points but also brings a lot of challenges. In drug discovery, the main process of improving the properties of the lead molecule to the properties required by the drug candidate is called the Design-Make-Test-Analyse (DMTA) cycle. This classical method based on hypothesis first uses available data to generate hypotheses and design molecules, then synthesize the designed compounds, and test them with appropriate detection methods to verify whether the hypothesis is correct and improve the understanding of the problem. Then, analyze this knowledge and transform it into a hypothetical design for the next cycle.
Pharmaceutical companies have begun to apply AI-related technologies to research and development projects through cooperation, but it is not time to place all bets on AI-based drug design. Due to the high complexity and uncertainty of drug development, we should still treat it with curiosity and caution, and strive to focus on providing solutions to key problems ranging from 0 to 1 such as new targets/new sites for current drug development. To achieve its maximum effect. The use of AI in drug design requires a long-term perspective, which can improve the efficiency of each stage of research and development, and reduce the barriers between different scientific research cultures, so as to build a healthy innovative drug research and development ecosystem.