Evaluation of log P o/w values of drugs from some molecular structure calculation software

Predictive software packages to estimate the lipophilicity of molecules have become key tools in the new drug design. Six different well-known computational programs including the classical BioByte-clogP and the GALAS algorithm offered by ACDlabs were evaluated through a set of 103 drugs with different structures and functionalities. To evaluate the predictions accuracy, reliable experimental log P o/w values for the whole testing set were carefully selected. The best estimations are performed by GALAS/logP based on the fragmental method, corrected according to the similarity with compounds included in the software training set.


Introduction
Lipophilicity, expressed by the logarithm of octanol-water partition coefficient log P, or distribution log D if ionised molecular species are present, constitutes a physicochemical property of paramount importance in medicinal chemistry and in overall drug discovery.Optimal lipophilicity in compounds should be targeted in the early phases of drug research since it contributes on individual ADMET (absorption, distribution, metabolism, elimination and toxicology) [1,2], including blood-brain barrier penetration and clearance [3].
Predictive software to estimate lipophilicity has also evolved as a master piece in the structure design for new chemical entities [4].Different methods for log P prediction have been developed and they can be divided into two main categories, substructure-based and property-based methods.
Substructure-based methods work decomposing the 2D structure of the compound into fragments (fragmental approaches) [5] or into single atoms (atom-based approaches) [6,7].The resulting log P value is obtained as a summation of terms, being the difference among the different fragmental methods, the set of fragments to identify in the target molecule, the contribution constant of each of those identified fragments, and the different correction factors to apply depending on the fragment's environment.In the case of the atom-based methods there are commonly no correction factors in the summation and log P values are obtained by adding the contribution of the different atom types present in the compound, where different methods have different atom type definition and contribution values.In all cases the different fragment references, atom type definitions and values are the result of different training sets and analysis techniques used for the development of each method.
Property-based approaches [8] estimate log P values using calculated descriptors that account for the entire molecule.Many of those techniques need the 3D structure of the compounds to calculate the required descriptors, and variability of results due to conformational uncertainty has to be taken into account.Some methods imply as well quantum mechanical semi-empirical calculations and others molecular dynamics, which makes them resource demanding and slow.To overcome those difficulties some linear models based on 2D-descriptors have been developed as well, however substructure-based methods continue being the most widespread in the pharmaceutical world thanks to their speed and easy implementation in MedChem Desktop tools and molecular modelling programs.
In this work, a set of 103 drugs which has been selected for representing a broad collection of pharmaceutical compounds with different structures and functionalities, and a range of log P o/w values from 0 to 7, has been used in order to evaluate the accuracy of the log P o/w values obtained through both classical and well accepted programs such as the BioByte-ClogP and more recently developed ones as the GALAS algorithm by ACD.

Calculation methods
A list of 103 pharmaceutical compounds with a wide variety of functionalities was selected to predict log P o/w values through six different substructure-based methods: AlogP, which is an atom-based method published by Ghose and Crippen [9] and implemented in PipelinePilot (Accelrys) [10], where carbon, hydrogen, oxygen and nitrogen are classified into 120 atom types; clogP, a fragmental method with correction factors that take structural and interaction factors into account, developed at the Pomona College Medchem Project [11], licensed from BioByte Corp and incorporated in ChemFinder; ChemProp/logP developed by Cambridge Soft and implemented in ChemFinder [12], it uses 3 fragmentation methods that can handle molecules containing different atoms; Classic ACD/logP [13], which is based on contributions of separate atoms, structural fragments and intramolecular interactions that have been derived form ACD/Labs internal database that comprises information from reference books, articles and public sources; GALAS/logP [14], recently developed by ACD where its name stands for Global, Adjusted Locally According to Similarity, the global method is as well fragmental and the baseline result is adjusted depending on the performance obtained for similar compounds from the training set; and Consensus/LogP [14] which is a consensus result on Classic and GALAS algorithms provided as well by ACD, where dynamic adaptive coefficients are assigned to each model according to the corresponding indications of prediction quality.

Results and Discussion
The log P o/w values obtained for the 103 compounds are gathered in Table 1.The reference log P o/w are experimental values previously determined [15], complemented by the values recommended in BioLoom online Database (Bioloom) and the "Gold standard" list selected by Avdeef [16].In order to discover the accuracy in used softwares, deviation obtained through experimental reference and predicted values are listed.Colored values represent outliers where deviations are higher of 0.6 units.This limit value was selected because of the experimental variability in log P o/w measurements that led the AOAC to admit differences of 0.3 units between log P o/w values measured from replicates using the shaking flask reference method as well as the variety of values shown for most compounds in the Leo et al [17] and Bioloom databases [18].According to the differences of Table 1, clogP, Classic ACD/logP, Consensus/logP, and Galas/logP are the most accurate methods with mean deviations very close to 0. The precision of clogP, Consensus/logP, and Galas/log P is also very good with a standard deviation between 0.4 and 0.5 logP unities.The standard deviation of Classic/logP is somewhat larger (0.65).AlogP is less accurate and precise (mean of -0.07 and standard deviation of 0.55) and ChemProp has the poorest accuracy and precision (mean of deviations of 0.22 and standard deviation of 0.71).
Figure 1 shows the correlation between the results of the calculation by the 6 different methods and reference log P o/w values and Table 2 the regression coefficients (R 2 ) and standard deviations obtained for those correlations.All methods give good linear correlations with R 2 values between 0.79 and 0.93.The results presented in Table 2 are consistent with those of Table 1.Consensus/logP and Galas/logP give the best results: they give exact results because the slopes and intercepts are not different from 1 and 0, respectively, at 95 % confidence level, but also the most precise (with the lowest overall standard deviations, 0.43 and 0.41).AlogP, Classic ACD/logP and ChemProp/logP have also slopes and intercepts not significantly different from 1 and 0, giving thus exact results but less precise (SD of 0.55, 0.66 and 0.70, respectively).The number of outliers for these last methods is higher too (Table 3).clogP gives quite precise results (SD = 0.46) but it shows a slope slightly higher than one and an intercept lower than zero.Thus, clogP may predict slightly negative deviations for hydrophilic compounds but positive deviations for hydrophobic ones.In fact, all methods have a lower prediction power for extreme ranges of log P o/w values, being the range of compounds with log P higher than 5 where accuracy tends to be poorer.This can be seen in Table 3 which reports the number of outliers in the respective log P ranges.Although Galas/logP and Consensus/logP values seem to be the most accurate methods for this set of compounds it has to be taken into account that the result of the fragmental method in those programs is just a baseline value, which is corrected adding a similarity weighted average from the differences between the baseline predictions and the experimental data for the most similar compounds in the training set to the submitted compound.This strategy must achieve to best fit the training set data, but the quality of the results for other compounds will depend a lot on the similarity of those to the training set.This may be quite high in our case, being all the 103 compounds known pharmaceutical compounds with public experimental values.In fact we have observed that at least for 87 compounds out of the 103, the same compound could be found in the training set, being their initial prediction therefore corrected to the experimental value with the highest weight.This handicap is overcome in clogP where results obey exclusively to the general formula, fragments, correction factors and coefficients developed for the method.In fact this method shows almost the same predictability than Galas/logP, the least number of outliers or unacceptable predictions (>0.6) among all methods, and it has better results than Classic ACD/logP and AlogP for this set of compounds.Finally, ChemProp shows the poorest performance according to accuracy, precision, and number of outliers.
In order to observe the performance of Galas/logP and Consensus/logP for compounds that are not part of the training set, we have evaluated the prediction for the remaining 16 compounds, being the results shown in Figure 2. We show the comparison as well for the remaining methods although in those cases we do not know whether those 16 compounds are part of the training set or not.We can see in Table 4 that accuracy of predictions for Galas/logP has diminished while clogP, Classic ACD/logP and AlogP predictions retain their predictability levels.In any case to better estimate the performance of the methods, it would be desirable to compare calculated versus experimental log P o/w value for a larger set of highly diverse compounds that had low structural similarity to compounds in the training set.It is worth to mention that to leverage that, Galas/logP offers the possibility to enlarge the training set with new compounds and their determined experimental data, which might be of use when predicting lipophilicity for analogs of on-going chemistry within a discovery program.
At that point we would like to stress that all the predictive software are initiated from experimental values obtained from the literature and not in all cases literature values of different sources coincide.To study those differences in some depth we used the mentioned 87 compounds that are at the same time part of ACD's training set and compared our reference experimental log P o/w values as summarize in Table 1 with the ACD reference experimental log P o/w values.We have centered our comparison on those ACD values as the experimental log P o/w values used to train the ACD models can be directly obtained within the software, however other training sets used to derive clogP or AlogP are not so easily available.
We observe that the values do not show a perfect correlation (Figure 3).A slope of 0.92 and a y-intercept of 0.2 are obtained.Although the variability in the experimental determination is a fact fully known, it is important to have that correlation value between experimental measures in mind when evaluating prediction methods, as in light of this, we cannot expect to obtain a correlation between experimental and predicted values higher than the 0.95.

Conclusions
All tested substructure-based methods have an acceptable accuracy and precision for estimating the lipophilicity of pharmaceutical compounds, where predictions tend to loose accuracy in the extreme ranges.
In the case of our dataset Consensus/logP and Galas/logP show the best results, however being the globally predicted values corrected depending on the performance of the method for the most similar compounds, and being our 103 compounds known pharmaceutical compounds, an analysis for a set of highly diverse compounds that had low structural similarity to compounds in the training set would be needed to verify its precision in evaluating log P values of a general dataset of compounds.Among the other methods that do not apply any structural analog approach clogP is performing best whereas ChemProp/logP is showing the lowest predictability.

Figure 1 .
Figure 1.Comparative linear relationship between log P reference and the different predictive log P softwares (n =103).

Figure 2 .
Figure 2. Comparative linear relationship between log P reference and the different predictive log P softwares (n =16: compounds not in ACD training set).

Figure 3 .
Figure 3. Linear relationship between log P o/w used in ACD database and the log P o/w reference used in this study (n=87).

Table 1 .
Predicted and experimental (reference) log P values.

Table 2 .
Regression coefficients and standard deviations for the different predictive log P softwares.

Table 3 .
Number of outliers with deviations higher than 0.6 units for the different predictive log P softwares and the different ranges of log P o/w reference values.

Table 4 .
Regression coefficients and standard deviations for the different predictive logP softwares (n =16: compounds not in ACD training set).