Multi-lab intrinsic solubility measurement reproducibility in CheqSol and shake-flask methods

This commentary compares 233 CheqSol intrinsic solubility values (log S0) reported in the Wiki-pS0 database for 145 different druglike molecules to the 838 log S0 values determined mostly by the saturation shake-flask (SSF) method for 124 of the molecules from the CheqSol set. The range of log S0 spans from -1.0 to -10.6 (log molar units), averaging at -3.8. The correlation plot between the two methods indicates r2 = 0.96, RMSE = 0.34 log unit, and a slight bias of -0.07 log unit. The average interlaboratory standard deviation (SDi) is slightly better for the CheqSol set than that of the SSF set: SDiCS = 0.15 and SDiSSF = 0.24. The intralaboratory errors reported in the CheqSol method (0.05 log) need to be multiplied by a factor of 3 to match the expected interlaboratory errors for the method. The scale factor, in part, relates to the hidden systematic errors in the single-lab values. It is expected that improved standardizations in the ‘gold standard’ SSF method, as suggested in the recent ‘white paper’ on solubility measurement methodology, should make the SDi of both methods be about ~0.15 log unit. The multi-lab averaged log S0 (and the corresponding SDi) values could be helpful additions to existing training-set molecules used to predict the intrinsic solubility of drugs and druglike molecules.


Introduction
This commentary considers the interlaboratory reproducibility of published aqueous intrinsic solubility data (log S 0 ) for 124 drugs, determined both by the potentiometric CheqSol (CS) method (233 reported log S 0 CS values) and mostly by the 'gold standard' saturation shake-flask (SSF) method (838 log S 0 SSF ). For each drug, its method-dependent interlaboratory measurement standard deviations (SD i CS and SD i SSF ) are estimated, by comparing solubility values for a given drug determined in different laboratories. The multilab averaged log S 0 (and the corresponding SD i values) could be helpful additions to existing training-set molecules used to predict the intrinsic solubility of drugs and druglike molecules.
The present contribution is the third in a series of papers aiming to address contemporary issues of solubility measurement, interpretation and prediction [1,2]. These are intended to serve as prologue/accompaniment to an upcoming session on solubility at the IAPC-8 meeting in Split, Croatia, [9][10][11] September 2019. Since 2009, the International Association of Physical Chemists (IAPC, www.iapchem.org) series of symposia maintained extensive coverage of the topic of solubility measurement, both from solid state and solution perspectives. At the 2015 IAPC-4 meeting, the special session on solubility measurement resulted in a widely-circulated 'white paper' drawing on expert consensus thoughts of scientists from six countries (Hungary, Russia, Serbia, Spain, Sweden, United States) [3]. It is expected that future sessions will continue to cover solubility methods and strategies, to critically address the different needs in pharmaceutical research, spanning from drug discovery to drug development.

Background
Most of the small-molecule research compounds in today's drug discovery projects are ionizable and poorly soluble in water, and are thus prone to show low and/or erratic in vivo intestinal absorption [4]. In discovery, high-throughput microtitre plate methods are used to estimate solubility, where small volumes of 10 mM DMSO solutions of library compounds are added to buffer solutions to induce formation of solid suspensions in the wells. Such estimates of solubility are needed, in part, to anticipate whether compounds would precipitate in bioassays (and thus indicate false positives). In parallel, methods to predict solubility of molecules play an important role in drug discovery, since virtual screening of compound libraries could prioritize molecules for testing in early in vitro screens [2,5,6].

Saturation shake-flask (SSF) and potentiometric (CheqSol) methods
In more advanced stages of drug research, solubility measurement necessarily becomes more rigorous, where 'solubility' refers to the concentration of a solute in a saturated solution, where the dissolved molecule is in a thermodynamic equilibrium with its crystalline form suspended in the solution (of a known pH, composition, and temperature). Accurate solubility measurement of druglike molecules can be very difficult to do well, although to an untrained eye it ought to be as easy as measuring the concentration of a molecule in water. In development projects, thermodynamic solubility measurements are usually done using some variant of the SSF method [3]. As an alternative, a potentiometric procedure, called the dissolution template titration (DTT) method was introduced in 1998 [7] and validated two years later [8]. The methodology is capable of producing highly precise (intralaboratory SD ~0.07 log unit) measurements of aqueous intrinsic solubility, log S 0 , i.e., the solubility of the uncharged form of an ionizable molecule. A much faster variant of the pH-titration method, called CheqSol, was described in 2005 [9]. Instruments implementing the potentiometric methods have been used in several universities and pharmaceutical companies.
All of the potentiometric methods require that the molecule be ionizable and that the accuratelymeasured pK a be provided. The molecule cannot be too soluble, since the method depends on the pH difference between a saturated and an unsaturated solution in the titration where the molecule is half ionized. So, it is ideally suited for low-soluble molecules, since these molecules display large pH differences. Furthermore, to calculate the CheqSol log S 0 , it is assumed that solubility as a function of pH follows the curve predicted by the Henderson-Hasselbalch equation. The molecule needs to be stable to hydrolysis when repeatedly exposed to pH conditions far from neutral. Means to recognize hydrolytic decomposition are important to incorporate into the measurement. Sometimes multiple polymorphs may form in the CheqSol method, which requires solid state characterization to identify. In the traditional SSF method, most of the time, the thermodynamically most-stable form of the solid is the one associated with the measured solubility [3]. Equilibration times are selected to be long (24-168 h) to ensure this expectation.

Challenging measurement
What can make solubility measurement of an ionizable low-soluble molecule so difficult? There are two sides to consider for the reaction at equilibrium: (a) the solid state and (b) the solution. Temperature needs to be regulated and specified. To keep the following examples simple, let's assume that the solvent is distilled water or an aqueous buffer, and that the crystalline form of the test compound is a free-acid/base or a salt. Multiple circumstances may arise, some making the interpretation of the measurement challenging: (i) the simplest suspension is the one where no drug ionization takes place on equilibration. One needs to measure the concentration of the compound in the saturated solution, and to confirm that the solid state form is unchanged -simple. (ii) However, if the solid introduced is not the thermodynamically most-stable polymorph (or is amorphous), then it is possible that the measured solubility would correspond to a different solid form. That's important to know. (iii) Complication can arise if a low-soluble weak base is added to water (usually saturated with CO 2 from the air). The pH will change, depending on the pK a of the molecule. The ambient CO 2 may act as a buffer, so the final pH of the saturated solution needs to be carefully measured. Otherwise, the calculated log S 0 could be quite erroneous. (iv) If a solid salt of the compound is added to water, a supersaturated solution may form. On equilibration, there could be two precipitates in the suspension: the original compound salt and the neutral free-acid/base form of the solid (at the pH called 'pH max '). Solid state characterization of the solid(s) and the measurement of pH would be highly beneficial. (v) When a buffer medium is used, the analysis of the solution and solid states can be complicated, as water-soluble drug-buffer complexes or aggregates may form [10]. In a supersaturated solution, drug molecules may self-associate as sub-micellar aggregates, particularly if they are surface-active [11,12]. For bases introduced as drug salts into a high-pH solution, the charged drug in the supersaturated solution can disproportionate into oil or undergo precipitation into an amorphous solid, along with which charged water-soluble aggregates may co-exist [10][11][12][13]. Given enough time, the multiple phases are expected to undergo transformation into a thermodynamically most-stable crystalline solid. Good understanding of solution chemistry and solid state characterization is essential for correctly interpreting the results of solubility measurements, so that high-quality intrinsic solubility data can be reported [3].

The need for high-quality data in accurate solubility prediction
Accurate prediction of the intrinsic solubility of druglike molecules [2] requires that (a) the log S 0 values used to train the prediction method are of high quality (with water solubility values, log S w , or values measured at a particular pH, log S pH , properly corrected for ionization [14] and with all solubility values referring to the same temperature [15]), and (b) the compounds in the training set cover the druglike chemical space of the test set of compounds.
These two important notions became the focus of a number of studies since 2008, spurred by the publications of Llinàs et al. [16] and Hopfinger et al. [17]. These authors introduced the 'Solubility Challenge,' a competition to probe the limits of prediction methods. The CheqSol method was used to measure the log S 0 of 132 structurally diverse drugs. The log S 0 values of 100 molecules were offered as the training set for the prediction of an external test set of 32 molecules (not found in the training set), whose values were not revealed before the completion of the competition.
In a number of earlier studies, it was suggested that the typical error in measured aqueous solubility is ~0.6 log unit or higher, when the solubility values were collected from many published sources [18]. This suggested that the quality of prediction methods was approaching the experimental limit. However, in the Solubility Challenge competition all of the values came from one laboratory, and the intralaboratory precision (repetitive measurements of the same sample by the same chemist, using the same instrument) of the CheqSol data was reported to be SD = 0.05 log unit. It was not known what the expected interlaboratory precision would be, given the unknown systematic errors that might affect the accuracy of results. For example, when 125 published CheqSol values were compared in 2015 to those obtained by the SSF method, it was reported that r 2 = 0.90, prediction root-mean-square-error, RMSE = 0.52 log unit, and there was a slight bias of -0.13 log unit [19]. The values in the comparison came from the Wiki-pS 0 database (in-ADME Research), which at that time contained 4557 log S 0 entries. The database now has 6355 entries, with many newly added CheqSol and SSF values. It was thus of interest to update and better characterize the comparison of data quality between the CheqSol and the 'gold standard' saturation shake-flask methods. In parallel, armed with new curated data, the second Solubility Challenge has just been announced [2], with the prediction submission deadline set to 8 September 2019, the day before the IAPC-8 conference starts. The Excel submission form in Supporting Info at https://pubs.acs.org/doi/suppl/10.1021/acs.jcim.9b00345 is freely downloadable for those interested to participate. The data described below could be a useful addition to other druglike training sets currently in circulation.

Data source: Wiki-pS 0 database
The ongoing Wiki-pS 0 database project [2,3,15,19], which started in 2011, now has 6355 log S 0 entries for 3014 different drug-relevant molecules (solids at room temperature), drawing on the study of 1325 publications. The overall interlaboratory standard deviation, SD i ALL = 0.17 log unit, has been estimated from the 870 molecules for which solubility was reported from two or more different sources (comprising 4209 individual S 0 values), by taking the average of the individual 870 SD values. The SD i ALL , being lower than the older estimate of solubility measurement error (~0.6 log unit [18]), indicates that (i) when legacy data are subjected to critical analysis, as recommended in [3,15,19], improvements in the quality of the extracted log S 0 data can be achieved, and (ii) there is room for further improvement to the current prediction methods. Alongside the database, the pDISOL-X program (in-ADME Research) was designed to interpret solubility data and make temperature corrections, to produce a reliable estimate of the underpinning log S 0 [10,11,20].

Results
There are 233 reported CheqSol log S 0 values in the Wiki-pS 0 database for 145 different druglike molecules. For 124 of the molecules, there are 838 reported log S 0 determined mostly by the SSF method. Of the 838 entries, 298 (36 %) were log S 0 values calculated from log S vs. pH data (using pDISOL-X), based on a total of 2925 individual log S pH measurments. (For 21 of the 145 molecules, SSF data have not been located in the literature.) Table 1 lists the solubility values for the 124 overlapping molecules measured by the CheqSol and SSF methods. The range of log S 0 spans from -1.0 to -10.6 (log molar units), averaging at -3.8.
Note that indomethacin is not listed in the table. The CheqSol value was not accepted in the Wiki-pS 0 database due to the hydrolytic decomposition encountered during the CheqSol assay [30]. On the other hand, the pH-metric DDT method indicated log S 0 DTT = -5.33 [31], in good agreement with the average of 20 interlaboratory SSF measurements: log S 0 SSF = -5.49, SD i = 0.23.  Figure 1 shows the correlation plot between the two types of measurements, with each point showing both method interlaboratory error bars. The statistics have improved slightly over the 2015 comparison [19], with current values being r 2 = 0.96, RMSE = 0.34 log unit, with a lower slight bias of -0.07 log unit. The average interlaboratory standard deviation is slightly lower for the CheqSol set over that of the SSF set: SD i CS = 0.15 and SD i SSF = 0.24, which probably highlights the benefit of using a highly standardized method (CheqSol) over an 'open' method (SSF). It should be kept in mind that the above comparison sets are small. For 870 of these sorts of comparisons, SD i ALL = 0.17. The intralaboratory comparison between the methods by one group of researchers performing both the SSF and CheqSol measurements (both highlystandardized) [32] produced the statistics r 2 = 0.96, RMSE = 0.20 for 15 compounds, comparable to SD i ALL . The latter is the target for computational methods to aim at, provided that the training sets are of high and consistent quality.

Conclusions
This brief commentary reasserts that the quality of the standardized CheqSol measurements is comparable to that of the 'gold standard' saturation shake-flask measurements. Measurement errors are much lower than commonly acknowledged in the computational prediction community. The intralaboratory (single instrument) errors reported in the CheqSol method (0.05 log) need to be multiplied by a factor of 3 to match the expected interlaboratory errors for the method (0.15 log). The scale factor, in part, relates to the hidden systematic errors in the single-lab values. It is expected that better standardizations in the 'open' SSF methods, as recommended in the 'white paper' [3], may equalize the SD i of both methods at about ~0.15 log unit. When solubility prediction methods indicate RMSE below 0.15, 'overfitting' is probably taking place, overlapping noise with information.