Defining desirable natural product derived anticancer drug space : optimization of molecular physicochemical properties and ADMET attributes

As part of our endeavor to enhance survival of natural product derived drug candidates and to guide the medicinal chemist to design higher probability space for success in the anti cancer drug development area, we embarked on a detailed study of the property space for a collection of natural product derived anti cancer molecules. We carried out a comprehensive analysis of properties for 24 natural products derived anti cancer drugs including clinical development candidates and a set of 27 natural products derived anti cancer lead compounds. In particular, we focused on understanding the interplay among eight physicochemical properties including like partition coefficient (log P), distribution coefficient at pH=7.4 (log D), topological polar surface area (TPSA), molecular weight (MW), aqueous solubility (log S), number of hydrogen bond acceptors (HBA), number of hydrogen bond donors (HBD) and number of rotatable bonds (nRot) crucial for drug design and relationships between physicochemical properties, ADME (absorption, distribution, metabolism, and elimination) attributes, and in silico toxicity profile for these two sets of compounds. This analysis provides guidance for the chemist to modify the existing natural product scaffold or designing of new anti cancer molecules in a property space with increased probability of success and may lead to the identification of druglike candidates with favorable safety profiles that can successfully test hypotheses in the clinic.


Introduction
Cancer is one of the major disease causes of mortality worldwide and the numbers of cancer cases are increasing gradually [1].Cancer is a main public health burden in both developed and developing countries and affects the lives of millions of people.Cancer is an abnormal growth of cells in the body, which underlies a collection of multiple genetic abnormalities through a multistep, mutagenic process.Cancer cells usually invade and destroy normal cells in the body.Factors responsible for cancers includes genetic predisposition, smoking, incorrect diet, infectious diseases and environmental factors.American Cancer Society has predicted ~27 million newly diagnosed individuals and ~17 million cancer related deaths globally by 2050 [2].The key problem to cancer treatment is the recurrence of tumor and the side effects of chemotherapy drugs.Hence, there is a potential demand to develop new and efficient anti-cancer drugs [3].Natural products have received increasing attention in the past 30 years for the discovery of novel cancer preventive and therapeutic agents [4].Natural products have been used for centuries for the treatment of several ailments.There are many basic ancient medicinal systems derived from dietary sources.Nature has provided a plenty of natural products with potential anti-cancer activity in the last few decades.Since 1940, approximately 175 small molecules have been approved as anti-cancer agents, of these, 48.6 % were a natural product or derivative [5].
Currently, pharmaceutical industry faces large attrition rates of preclinical and clinical candidates due to toxicity or lag of optimal pharmacokinetics properties, resulting in high costs and increased timelines for the drug discovery process [6].Lead structures are compounds that typically exhibit suboptimal target binding affinity.Pharmacological studies have shown that there is a difference exists between leads and drugs [7].The present study is an approach to establish the difference between some selected potent anticancer natural compounds (leads) and FDA approved natural product derived anti cancer drugs considering the distribution of physicochemical, ADME (absorption, distribution, metabolism, and elimination) attributes and in silico toxicity endpoints.This data was examined with the goal of identifying trends and defining a set of property values that would best define the anticancer drug space associated with a higher probability of clinical success.Several critical physicochemical properties of compounds like log P, log D, TPSA, M W , log S, HBA, HBD and n Rot proposed by various research groups should be considered for compounds with oral drug delivery as a concern [8].
The information obtained from this analysis could, in turn, be utilized to design anticancer drug molecules with optimum bioavailability and less or no toxicity based on the alignment of a set of key properties.The multi-parameter optimization (MPO) approach is very popular for providing guidance on how to design preferred molecules to reduce the attrition rate and increase the probability of prospectively designing molecules that survive preclinical safety studies and that possess optimal pharmacokinetic and pharmacodynamic properties to test hypotheses in the clinic [9].Tremendous progress has been made in recent years in terms of enabling the development of robust pharmacokineticpharmacodynamic (PK/PD) relationships for anticancer agents as well as in understanding how these relationships are influenced by molecular physicochemical properties.Key physicochemical properties related to a drug like molecules have been described previously by various research groups [10].The most important and well-known rule of five (RO5) was given by Lipinski et al. in 1997 based on the database of clinical candidates that had reached phase II trials or further [11].RO5 provided the end points for four crucial physicochemical properties that described 90 % of orally active drugs: (a) molecular weight, MW < 500 Da; (b) calculate of 1-octanol/water partition coefficient, ClogP < 5; (c) number of hydrogen-bond donors, (OH plus NH count) < 5; and (d) number of hydrogenbond acceptors, (O plus N atoms) < 10.These four physicochemical properties and their endpoints are associated with acceptable aqueous solubility and intestinal permeability, the important first step of oral bioavailability [11].After the Lipinski's RO5, various other ways of predicting the drug-like space for rational design purposes have been introduced by other people.Veber et al. 2002, showed that molecular weight cutoff at 500 Da does not itself significantly separate compounds with poor oral bioavailability from those with acceptable values based on the oral bioavailability measurements in rats for GlaxoSmithKline database of almost 1100 drug candidates [12].He suggested that compounds having two criteria of (1) number of rotatable bonds (n Rot ) ≤ 10, (2) TPSA ≤ 140 Å 2 will have a high probability of good oral bioavailability.Another effective range of physicochemical properties provided by Ghose et al. 1999, based on Comprehensive Medicinal Chemistry (CMC) database can be used in the design of drug-like combinatorial libraries [13].
To go beyond the properties associated with the RO5 and other drug-like filters, we became interested in developing a holistic understanding of physicochemical property space for anticancer molecules by carrying out a thorough analysis of properties for natural product derived anticancer drugs and a set of natural product lead anticancer molecules, as most of the anticancer drugs have been derived from the natural products [14].Herein, we present our efforts to develop a prospective MPO design tool for anti cancer molecules that does not focus on hard cutoffs or single end points but utilizes the eight essential physicochemical properties to prospectively align drug like attributes such as high permeability, low P-gp efflux liability, low metabolic clearance, and high safety into one molecule.In order to increase the flexibility in design and probability of identifying candidates with optimal pharmacokinetic and safety profile, we should not use hard cutoffs or focus on a single property, as it may restrict design space and may not align multiple attributes at once.

ADMET property calculations
Poor pharmacokinetic properties are one of the main reasons for terminating the development of drug candidates.Computed physicochemical properties associated with compounds that have good oral bioavailability, less or no toxicity and optimum values of physicochemical properties are key parameters for the anti cancer drug discovery, and we need compounds with good pharmacokinetic properties [15,16].The drug set used in this study includes 24 natural product derived anti cancer drugs and structure of these drugs are mentioned in Figure 1.To the best of our knowledge, all compounds in the drug set could be used as oral agents [17].The lead candidates included in our analysis consisted of 27 natural products derived anti cancer compounds that collected from the literature belong to the several chemical classes as shown in Figure 2 [18].A complete list of the drugs and lead candidates used in the analysis appears in Table 1.
ADMET related physicochemical properties for 24 natural product derived anticancer drugs including clinical development candidates and 27 natural product lead anticancer compounds were predicted using OSIRIS Datawarrior Version 4.2.2 software on a Windows XP operating system [19].DataWarrior is able to calculate physicochemical properties, lead-or drug-likeness related parameters, ligand efficiencies, various atom and ring counts, molecular shape, flexibility and complexity as well as indications for potential toxicity.
After calculating properties, these are automatically added as new columns to the data table.Chemical structures for 24 Natural Product derived anticancer drugs including clinical development candidates and 27 Natural product lead compounds were downloaded and saved individually in 3D SDF format from pubchem (www.pubchem.org).DataWarrior is unable to optimize structures; therefore, geometry optimization of the molecules was performed in Avogadro software prior to the prediction of physicochemical properties [20].DataWarrior software calculates the descriptors as inputs to independent mathematical models to estimate a range of ADMET values at relevant pH 7.4.Physicochemical properties of interest included predicted lipophilicity (log P), predicted aqueous solubility (log S), topological polar surface area (TPSA), molecular weight (MW), hydrogen bond donor (HBD), hydrogen bond acceptor (HBA), and number of rotatable bonds (n Rot ).Specific ADME properties of interest included predicted distribution coefficient at pH=7.4 (log D) (value predicted by ACD/Labs, www.chemspider.com),predicted aqueous solubility (log S), quantitatively predicted apparent permeability (P app Caco-2 cell), predicted effective permeability (P eff ), and predicted Human Intestinal Absorption (HIA).In order to evaluate the distribution of drugs and leads, we considered two important parameters including a fraction of unbound to plasma proteins (F u ), and volume of distribution (V Dss ), a requirement of all clinical candidate through recently developed online ADMET calculation tool pkSCM (http://bleoberis.bioc.cam.ac.uk/pkcsm/) [21].To determine excretion routes, natural products anticancer drugs and leads we quantitatively predicted the total clearance and qualitatively predicted renal OCT2 substrate.The safety profile of compounds is one of the most common factors in drug attrition (1).As part of our analysis of properties for natural products anticancer drugs and leads, we assess some of the major toxicity endpoints.We generated in silico data to assess potential for the following safety risks: drug-drug interactions (CYP inhibitions) including CYP3A4, CYP2C9, and CYP2D6, hERG liability (inhibition of dofetilide binding), predicted LD50, predicted hepatotoxicity, predicted skin sensitization, cellular toxicity through pkSCM tool and mutagenicity, tumorigenicity and irritant effects through DataWarrior software.To access the likelihood of binding to transporter permeability-glycoprotein (P-gp), we used Pgp_Substrate model.We also calculated the three most crucial drug-likeness filters including Lipinski, Ghose, and Veber rules as well as the quantitative estimate of drug-likeness (QED) with the Drug Likeness Tool (DruLiTo) software (http://www.niper.gov.in/pi_dev_tools/DruLiToWeb/DruLiTo_index.html).

Optimum physicochemical property space for anticancer molecules
The 24 natural product derived anticancer drugs including clinical development candidates and 27 natural product lead anticancer compounds were evaluated against a set of eight calculated fundamental physicochemical properties that have gained wide acceptance as key parameters for drug design and development: (a) lipophilicity, calculated partition coefficient (log P); (b) distribution coefficient at pH=7.4 (log D); (c) molecular weight (M W ); (d) topological polar surface area (TPSA); (e) number of hydrogen bond donors (HBD); (f) hydrogen bond acceptor (HBA), (g) number of rotatable bonds (n Rot ) and predicted aqueous solubility (log S) [11,22].The calculated physicochemical properties value for the drugs and leads are mentioned in Table 1.
Physicochemical property space as captured by these eight parameters was quite broad (Figure 3).The M W values for the drugs varied from 246 to 853 with a median M W value of 514, while M W range for leads is much broad and varied from 114 to 975 with a median value of 336.The log P value of the drugs varied from 0.46 to 4.67 with a median log P value of 2.6.Molecules in the lead set having the log P value from -2 to 16, which is quite broad with a median of 2.68.There was no significant difference in the median log P values between the two sets, although the drug set had a lower span of log P values.Low hydrophilicity (e.g.high log P) values can cause poor absorption or permeation.Our analysis suggests that for anticancer drugs, there may be a need to design compounds with further reduced log P or M W to better match the corresponding properties in the drug set.Polarity, as described by polar surface area (TPSA), ranged from about 29 Å 2 to 224 Å 2 with a median value of 118.25 Å 2 for the drug set, while the polar surface area (TPSA) for lead molecules span from 0 Å 2 to 266 Å 2 .There was a significant difference in the TPSA values between lead candidates and drugs, almost ~60 % of the lead candidates having TPSA< 80 Å 2 , oppositely 75 % drugs have TPSA ≥ 80 Å 2 , which clearly suggests there is a huge need for the optimization of TPSA of lead candidates.The drugs and the lead candidates had a minimal number of hydrogen bond donors (HBD), with the median value of 2 for both sets.Almost ~78 % of lead candidates and 83 % of drugs having the HBD value ≤ 3. Lipinski's RO5 identified HBD as a critical component of the drug property analysis and targets a HBD count (OH plus NH count) of < 5. Based on the number of HBD associated with anticancer drugs and lead candidates, optimization of HBD to ≤ 3 may increase the probability of identifying better anticancer molecules.Hydrogen bond acceptor is another valuable physicochemical property in RO5, drugs and the lead candidates having the median value of 10 and 4 respectively.Only 11 out of 24 drugs in this study are following the HBA < 10 rule of Lipinski, on the other hand, most of the lead candidates ~89 % having the HBA < 10, so anticancer lead candidates well following this RO5 compare to anticancer drugs.Aqueous solubility is another very important parameter for the oral bioavailability.The recommended range for a molecule to be good oral bioavailable is (-6 ≤ log S ≤ 0.5), here all drugs are following this rule and almost ~95 % of lead candidates are also falling within the recommended range.The majority of this increased TPSA should originate from an increased number of HBA, as HBD must be strictly controlled at ≤ 6 to avoid reducing oral bioavailability.
Log D provides a better measurement of lipophilicity for ionizable compounds; we know that hydrophilic molecules have higher solubility, but are less equipped to readily cross the cell membrane.Hence, a compound is considered to be hydrophilic, if log D < 0, lipophilic if log D > 0, and molecule excessive lipophilic by molecules with log D ≥ 3.5.Nearly ~85 % of the lead candidates having a log D value from 1 to 8 (1 ≤ log D ≤ 8), with a median of 3.15, which suggest maximum lead candidates are hydrophobic in nature.As expected for anticancer drugs, a similar but narrower range existed for log D, which varied from -0.86 to 4.88 with a median value of 2.57.
In order to accurately access the differences between anticancer drugs and leads, four drug-like indices were utilized for comparison; Lipinski's RO5, Ghose filter, Vebers's selective criteria for oral bioavailable drugs, and QED by DruLiTo.The detailed results in percentage (%) of lead and drug molecules are following and violating the above mentioned most promising oral bioavailable rules are mentioned in the Figure 4(a  The inspection of the bar graph shown in Figure 4(a) reveals that leads are following most of the bioavailable rules greater than the drugs except the Ghose filter.Drugs and leads are showing almost equal percentage of molecules ~38 % following the all selected filters, which is obtained by considering all 4 bioavailable, filters (Lipinski's RO5, Ghose filter, Vebers's rule, and QED) together.Similarly, greater percentage of drugs are violating the bioavailable rules except the Ghose filter.While the natural product derived anticancer drug space as defined by M W , log P, log S, HBA, HBD, TPSA, n Rot and log D, is broad.Hence, our emphasis has been on defining physicochemical property rules for compounds to reduce attrition and increase the likelihood of candidates at various stages of anticancer drug development based on our analysis as well as earlier published various oral bioavailability rules by different research groups.
Our analysis shows the optimal property ranges (covering almost ~80 % or more of the anticancer drugs) used to select drug-like anticancer molecule for these properties are 200 < M W ≤ 800 Da, 1< log P ≤ 5, -6 ≤ clog S ≤ -1, 5 ≤ HBA ≤ 13, 1 ≤ HBD ≤ 5, 50 ≤ TPSA ≤ 180 Å 2 , 0 ≤ n Rot ≤ 10, log D=2.8, which may be very helpful in prospective design of anticancer molecules from natural products or identification of lead candidates from natural products that can successfully progress to the clinic and becomes better anticancer drug.For some of the physicochemical property, specifically M W , HBA and n Rot lead molecules are showing better optimal range compare to the drug candidates according to the RO5, which are quite significant for shaping the ADMET properties of potential anticancer drug candidates.

Profiling ADME space of anticancer molecules
Potential therapeutic compounds are useless without having a good ADMET profile, and thus, it is essential to find the source of such diminished potency for developing a drug.Significant advances in the development of HT in vitro ADME assays have enabled computational scientist to make robust computational models to the earlier assessment of potential liabilities (low permeability, susceptibility to efflux transporters, etc.) associated with new potential lead compounds.In order to gain a better perspective on the ADME properties of drugs and lead candidates, we evaluated the in silico profiling of these compounds to assess Caco2 cell permeability, human intestinal absorption, and P-gp efflux liability [23].We can classify the permeability of a molecule as low, or high based on the predictive model and its relative range of log P app (in 10 -6 cm/s) rates are, as follows: log P app > 0.9, considered to be high permeability, while log P app < 0.9, considered to be low permeability of the molecule.The Caco2 cell permeability values for lead candidates and drugs are mentioned in Table 2.
In the Caco2 cell permeability prediction, 70 % of the lead candidates show high log P app values; surprisingly the drugs had a lower percentage (30 %) with high log P app values.A similar discrepancy was observed when we assessed P-gp efflux liabilities for drugs and lead candidates.The P-gp efflux liability was assessed utilizing preADMET's [https://preadmet.bmdrc.kr/]P-gp_Substrate model.Prediction of the likelihood of Pgp efflux shows that all drugs and 60% of the lead candidates are considered to be P-gp efflux substrates; the predicted values from the Pgp_Substrate model for both drugs and lead candidates dataset are mentioned in Table 2.An optimal clinical candidate could be achieved if it is possessed both high log P app and low P-gp efflux liability.
The intestine is the primary site of absorption for the orally administered drugs; hence, we predicted the percentage (%) of human intestinal absorption of the drugs and lead candidates.Systemic oral dosage requires compound properties that allow for dissolution and stability in the gastrointestinal (GI) tract, including the acidic environment of the stomach (pH 1-2 in fasted state, 3-7 in fed state) and the close to neutral environment (pH 4.4-6.6) of the small intestine [28.The % human intestinal absorption of the drugs and lead candidates was assessed by using pkCSM web server model [21].All the drugs are showing good predicted human intestinal absorption > 60 %; while 93 % of the lead candidates predicted >70 % human intestinal absorption.The predicted % human intestinal absorption values for both drugs and lead candidates dataset are mentioned in Table 2.
Many of the drugs in plasma will exist in equilibrium between an unbound state and a bound to serum proteins or whole blood proteins at various affinities.It is commonly accepted that only unbound drug may interact with anticipated molecular targets [24]; hence, the efficacy of a drug might affect by the degree to which it binds whole blood proteins.We have predicted the fraction unbound of both drugs and lead candidates through the predictive model of pkCSM, which was built using the measured free proportion of 552 compounds in human blood (F u ).We also evaluated the steady-state volume of distribution (V Dss ) of drugs and lead candidates; another important parameter, which suggests the total dose of a drug would be required to be uniformly distributed to provide the similar concentration as in blood plasma.The values of predicted fraction unbound (Fu) and V Dss values for both drugs and lead candidates dataset are mentioned in Table 2. Evaluation of individual ADME properties (P app , P-gp, F u ) suggested that to increase the probability of success, the design should focus on optimizing all properties of a molecule.In order to understand the effect of physicochemical properties on ADME attributes of the molecules, we analyzed ADME attributes for both drugs and leads against all eight fundamental physicochemical properties.Interestingly, TPSA, HBD, and HBA showed good correlation with the Caco2 cell permeability, with the correlation coefficient of 0.83, 0.8, and 0.7 respectively for the drug molecules.Similarly, anticancer leads also showed slightly better correlation of physicochemical properties TPSA, HBD and HBA with Caco2 cell permeability, with the correlation coefficient of 0.83, 0.84, and 0.78, respectively.Although, all three physicochemical properties (TPSA, HBD, HBA) are inversely correlated with the Caco2 cell permeability suggesting that lipophilicity is important for molecule to have good Caco2 cell permeability and by optimizing the TPSA, HBD, and HBA cell permeability of a molecule can be enhanced.Human intestinal absorption also showed good correlation with the HBD, TPSA, and HBA physicochemical properties for both anticancer drug and lead molecules.The correlation coefficient of % human intestinal absorption with HBD, TPSA, and HBA was 0.88, 0.74, and 0.67 respectively for drugs and 0.91, 0.81, and 0.8 respectively for leads.This results clearly revealed the influence of physicochemical properties on ADME attributes of the molecule for oral bioavailability and the key physicochemical properties especially HBA, HBD, and TPSA need to be consider for further improvement in the ADME profile of natural product derived anticancer leads.

Determining potential safety end points for anticancer molecules
Early prediction of the safety endpoints through in silico techniques screening have become regular practice for both designing new molecule and screening of the large chemical databases within pharmaceutical industries [25].As part of our analysis of properties for anticancer drugs and lead candidates, we determine the potential toxicity end points for through pkCSM.Most frequently measured end points to evaluate potential safety issues include inhibition of cytochrome P450 (CYPs) monooxygenase enzymes to determine potential for drug-drug interactions [26], inhibition of hERG potassium ion channel effects [27], lethal rat acute toxicity (LD50) and other crucial toxicity (AMES toxicity, skin sensitization, and hepatotoxicity).All toxicity predictions for both drugs and lead candidates are presented in Table 3.
We qualitatively predicted the inhibition of CYP2D6 and CYP3A4 through pkCSM, which suggests the potential for drugs and leads candidates to mediate drug-drug interactions (DDI) through perturbation of clearance mechanisms for other drug substances.Inhibition of the potassium hERG channel might cause in prolongation of the QT interval of cardiac rhythm, which has resulted in the withdrawal of many clinical candidates from the market [28].Therefore, we have qualitatively predicted the potassium hERG channel inhibition potential of drugs and lead candidates.The data obtained suggests that all of these drugs and lead candidates are non-inhibitor of the hERG channel as mentioned in Table 3. Analysis of the inhibition data of CYP2D6 and CYP3A4 revealed that all the drugs are non-inhibitor of both CYP's and all lead candidates are non-inhibitor of CYP2D6, while 89 % of lead candidates are non-inhibitor of CYP3A4.This data suggests that most of these drugs and lead candidates occupy desirable, low-risk space for DDI.Drug metabolism and the drug excretion also have a significant role in the drug design process.Issues related to metabolism have been commonly associated with the compounds failure in the clinical.Understanding the metabolic pathways of drugs would be very helpful in predicting drug-drug interactions (DDI), toxicities, and pharmacokinetics [29].Many relationships between CYP family enzymes and in silico molecular properties have been available in the literature; the primary concern is inhibition of CYP3A4, which is correlated to increasing M W and log P [30].This may lead to issues with clearance as well as drugdrug interactions.We have predicted the total clearance for drugs and lead candidates measured by the proportionality constant and primarily occur as a combination of hepatic and renal clearance mentioned in Table 3.
Hughes et al. have done the most considerable work with regard to the impact of molecular properties on in vivo toxicity, led to the "3/75 rule", derived from an analysis of exploratory or dose-finding toxicology studies of 245 compounds at Pfizer [31].Key finding emerged from this analysis was that compounds with a clog P < 3 and TPSA >75 Å 2 were 2.5 times more likely to be non-toxic at the same total exposure.Reversely, compounds those with high lipophilicity (clog P > 3) and low polar surface area (TPSA < 75 Å 2 ) had an increased risk of widespread toxicities in short-term animal studies.One crucial elucidation of these results would be that promising lipophilic compounds with small polar functionality likely to have an increased chance of toxicity.A similar study by AstraZeneca [32] on their compound failures showed a different profile, with the majority of failure happening with TPSA > 75 Å 2 and clog P < 3. Though, attrition in the high-log P-low-TPSA space can readily be rationalized via consideration of promiscuity and interactions across a range of systems.Further Eli Lilly Company study of > 400 (Eli Lilly) compounds supported the influence of compound lipophilicity on toxicology in rat toxicological studies [33].In this analysis, there was a three-fold enrichment in toxic compounds when log P > 3, but TPSA had little or no influence.Clearly, the benefits of establishing a link between important clinically relevant end points and simple descriptors such as log P and PSA (which can be easily calculated before synthesis) are highly attractive.
We also analyzed our natural product derived drugs and lead compounds predicted toxicity endpoints, to establish meaningful correlations between physicochemical properties and toxicity profile of compounds.Predicted toxicities of drugs and leads have been categorized as "yes" or "no".Most of anticancer drugs in our dataset having the low lipophilicity (clog P < 3.5) are showing the hepatotoxicity (e.g.camptothecin, rohitukine, carfilzomib, docetaxel, etc.), out of which some drugs also having clog P < 3.5 and M W > 700 also showing hepatotoxicity (e.g.vinblastine, vincristine, vindesine, carfilzomib, and docetaxel, etc.).Similarly, four drugs, showed the AMES toxicity, also having the low lipophilicity (clog P < 2).This link between the low lipophilicity of compounds and toxicity is in line with the results of Hughes et al. [31] and other research groups.Furthermore, some drugs showing the toxicity but no specific correlations with physicochemical properties was found, possibly the toxicity was a consequence of the primary drug target mechanism or of a specific off-target pharmacology.Examination of the relationship between physicochemical properties and other predicted toxicity end points, we found very good correlation for the drug molecules between the physicochemical properties and Oral Rat Chronic Toxicity (LOAEL).The correlation coefficient of LOAEL with M W , HBA, HBD, TPSA, and n Rot was 0.85, 0.84, 0.68, 0.84, and 0.77, respectively.All five physicochemical properties are positively correlated with the LOAEL, suggesting the need for the optimization of these physicochemical parameters to avoid the LOAEL toxicity.On the other hand, no correlation was observed between the LOAEL and physicochemical properties for lead molecules, which is evident from the Table 3, that drug molecules showed more toxicity endpoints compare to lead molecules.Hence, establishing meaningful correlations between the physicochemical properties and toxicity of natural product derived oral anticancer drugs and leads might be useful for future anticancer drug discovery.

Conclusions 3
Improving the survival rate of clinical candidates and reducing the drug attrition is governed by multi-4 factors, and thus, a holistic strategy that addresses key attrition factors (safety, ADME, and efficacy).

5
Chemical space defined by physicochemical properties is vast, yet there are several design parameters that 6 medicinal chemists can follow when designing druglike compounds (e.g., Lipinski's Rule of Five) and 7 defining the parameters that increase the likelihood of identifying best in class molecules is of critical 8 importance.Understanding the fundamental relationships between physicochemical properties and in vitro 9 and in vivo results is primary need to prospectively design compounds with an overall desired profile.As 10 part of our efforts to further build this understanding in the Anticancer drug development space, we 11 undertook a thorough analysis of the physicochemical properties, ADME attributes, and safety end points 12 for 24 natural product derived anticancer drugs and 27 natural product lead candidates.We examined a 13 comparison of eight fundamental physicochemical properties associated with these two sets of 14 compounds: log P, log D, M W , TPSA, HBD, HBA, log S and n Rot .The anticancer drug space defined by these 15 physicochemical properties is pretty broad, but our analysis identified the optimum ranges for each of 16 these properties.The optimal property ranges (covering almost ~80 % or more of the anticancer drugs) 17 were found to be 200 < M W ≤ 800 Da, 1< log P ≤ 5, -6 ≤ clog S ≤ -1, 5 ≤ HBA ≤ 13, 1 ≤ HBD ≤ 5, 50 ≤ TPSA ≤ 18 180 Å 2 , 0 ≤ n Rot ≤ 10, log D=2.8.Analysis of in silico generated ADME data reinforced that the majority of 19 anticancer drugs (70 %) are low permeable (Caco2 of log P app (in 10 -6 cm/s) < 0.9), and also all drugs are 20 considered to be P-gp efflux substrates, and with low to moderate clearance rates.

21
On the other hand, our analysis showed that for anticancer drugs, there may be a need to optimize new

Figure 1 .
Figure 1.Chemical structures of 24 natural products derived anti cancer drugs.

Figure 2 .
Figure 2. Chemical structures of 27 natural products derived anti cancer lead molecules.

Figure 3 .
Figure 3. Physicochemical property distribution and statistics of drugs and lead candidates are shown for M W , log P, log S, HBA, HBD, TPSA, n Rot and log D.

Figure 4 .
Figure 4. Bar graph for percentage (%) of lead and drug molecules are (a) following, and (b) violating the Lipinski's RO5, Ghose filter, Vebers's rule, QED, and all selected filters.
22 compounds with further reduced M W , HBA, and n Rot to better match the corresponding properties in the 23 marketed drug set.In addition, we have established meaningful correlations between the physicochemical 24 properties specially HBA,HBD, and TPSA and ADME attributes of the molecules that might be generally 25 applicable for the future anticancer drug development and optimization of the natural product derived 26 anticancer leads/clinical candidates.Our study showed the meaningful correlations between 27 physicochemical properties and toxicity profile of compounds.Log P and M W are most critical 28 physicochemical parameter and robust predictor of toxicity profile of anticancer leads/clinical candidates 29 We showed by our analysis that early prediction of physicochemical properties, ADME attributes, and 30 safety attributes through in silico tools are all important parameters to enable better lead candidate 31 selections, saving considerable time and effort in the anticancer drug development.32 33

Table 1 .
Important computed physicochemical properties for anticancer lead candidates and drugs.

Table 2 .
Computed ADME properties for anticancer lead candidates and drugs.