Diagnosing solubility limitations – the example of hydrate formation

Solubility is regarded as one of the key challenges in many drug discovery projects. Thus, it is essential to support lead finding and optimization efforts with appropriate solubility data. In silico solubility prediction remains challenging and therefore a screening assay is used as a first filter, followed by selected follow-up assays to reveal what causes the low solubility of a specific compound or chemotype. Results from diagnosing the underlying reason for solubility limitation are discussed. As lipophilicity and crystal lattice forces are regarded as main contributors to limiting solubility, changes in solid state are important to be recognized. Solubility limitation by various factors will be presented and the impact of the solid-state is exemplified by compounds that are able to form hydrates.


Introduction
The importance of solubility for a molecule to become a successful drug has been debated for decades, as compounds with insufficient solubility bear a higher risk of attrition and are linked to higher cost in drug development.Solubility issues have been associated with several factors, like increased use of highthroughput screening procedures and thus the tendency to higher lipophilicity and molecular weight [1].Furthermore, there is a shift of therapeutic targets from G-protein coupled receptors and enzymes to more challenging ones, such as kinases, ion channels, nuclear receptors and protein-protein interactions.For these targets, often a higher lipophilicity [2] or strong intermolecular interactions, such as intermolecular hydrogen bonds, is required.Both lipophilicity and intermolecular bonds tend to negatively affect compound solubility, and the aim of this work is to present a way to distinguish them using an example in which the solid state contribution is impacted by hydrate formation.
In fact, solubility still remains one of the key challenges in many drug discovery projects.Thus, it is essential to support the lead finding and optimization efforts with appropriate solubility data.Despite advancement in in silico solubility prediction, it remains a challenging field, mainly due to the low success of prediction of solid state contribution to the thermodynamic solubility of a crystalline drug compound.As a consequence, a screening assay is used as a first filter in a modern drug discovery environment, followed by selected follow-up assays to reveal what causes the low solubility of a specific compound or chemotype.
Solubility high-throughput screening assays typically allow the determination of solubility with a reasonable experimental error and within short turn-around time.Their main drawback is that the solid-state of the investigated compounds is either not characterized at all or at best by methods that allow a qualitative assessment only, like e.g.birefringence or polarized microscopy [3].
Such high-throughput solubility data allows tracking the development of a chemical series, e .g. by creation of QSAR models [4].Nevertheless, to overcome solubility issues beyond lipophilicity, guidance for the chemistry teams on what is the root cause of solubility limitation is inevitable.For that, secondary assays need to be applied that allow conclusions to be drawn on the interplay of the main solubility limiting factors.For most of the compounds in the chemical space of low molecular weight drug candidates these factors comprise lipophilicity, crystal lattice forces and ionization.In addition, other factors like common ion effect and compound aggregation in solution may further limit the solubility of a molecule.
As a drawback, these investigations require the availability of about 20 mg of crystalline material and are carried out at medium throughput.Consequently, this kind of solubility diagnosis can be performed on selected molecules only.Nevertheless, when performed early enough during lead optimization, solubility issues of a chemical series can be revealed and the knowledge acquired can be applied to the optimization strategy.
Acceleration of solubility diagnosis could rather profit from advancements in in silico prediction of properties that are currently not accessible at the desired level of quality.These include the melting point and enthalpy of fusion as a surrogate for crystal lattice energy, as well as the lipophilicity.Although prediction of lipophilicity is broadly used with sufficient accuracy in general chemical space, the applicability of the selected prediction tool for a specific chemical series needs to be confirmed by high quality measurements.Prediction of melting point and lattice energy is currently not available at a level beyond classification (high/intermediate/low) [5].Additionally, it has been attempted to express the contribution of crystal lattice by chemical descriptors [6]; likewise improvements to the 'General Solubility Equation' (GSE) by including the PSA have been reported [7].
In the present work, the procedure to reveal the root cause of solubility limitation is exemplified by various compounds, including some that are capable of forming hydrates.Typically, the impact of the solid state on the performance of a chemical entity as a drug is investigated in the early development phase in the context of clinical formulation development, rather than during lead optimization.However, solid state limitations pose a risk that needs to be identified before compounds are selected for clinical development.As a consequence for oral administration, thermodynamic driving forces might limit the absorption of a compound in the human gastro-intestinal tract, despite any formulation efforts.

High-throughput equilibrium solubility
Sample preparation for high-throughput equilibrium solubility determination is performed in microtiter plate format on an automated platform that is based on a STARPlus module (Hamilton).In addition, the platform comprises a centrifuge (Rotanta, Hettich), two plateshakers (TiMix 5, Edmund Bühler), a heat sealer (ALPS-3000, Thermo Scientific) as well as an evaporator (Combidancer, Hettich).Plates are handled by a robotic arm (TX60 L, Stäubli) that allows the transfer between the STARplus module and the various instruments.Sample analysis is performed by mass spectrometry using a Rapid-Fire-Q-TOF combination (Agilent).Results are stored in a proprietary laboratory information system and data evaluation is performed by Excel (Microsoft).

Secondary assays
Equilibrium solubility is determined using the shake-flask method.About 1.5 mg of compound is put into a vial and 0.75 ml of buffer is added.The mixture is shaken for 20 to 24 hours and the solid phase is separated by centrifugation.Concentration of the compound in the supernatant is determined by HPLC-UV (Agilent 1200) based on a calibration curve.
The partition coefficient between aqueous phase and 1-octanol (log D) is determined by various methods.Ionizable compounds are analyzed by potentiometric titration (Sirius T3, Sirius Instruments Ltd.).Non-ionizable compounds are either analyzed by a shake flask method or by reverse-HPLC (elogP) [8].
Intrinsic solubility (i.e.solubility of the neutral species) is determined by a potentiometric titration approach [10].0.7-1.2mg of compound (final concentration 1 mM) is dissolved in aqueous buffer.The intrinsic solubility is determined by mathematical analysis of the titration curve based on measured pK a values.
Ionization species distribution is calculated by the Henderson-Hasselbalch equation.Polar Surface Area (PSA) is calculated by the topological PSA method according to Ertl et al. [11].
Melting point and enthalpy of fusion are determined by Differential Scanning Calorimetry (DSC) with a DSC Q2000 (TA Instruments).A standard heating rate of 10 K/min was used.

Solubility diagnosis
The solubility diagnosis tool is an advancement of the analysis published by Faller and Ertl [12].The relationship between log P and log 1/S 0 revealed that only a minority of compounds show higher solubility than determined by their lipophilicity.Many potential drug candidates do not fall on the unity line between log P and log 1/S 0 , but show rather lower solubility than predicted by log P. Therefore, the authors [12] introduced the SL parameter (Eq. 1) as a descriptor of additional interactions that limit solubility. (1) In order to quantify the contribution of the various factors that contribute to solubility limitation, a table has been created that summarizes the solubility in the relevant buffer medium, ionization effects, the solute-solvent interactions as well as crystal lattice energy contributions (see Table 1).The solubility diagnosis mainly helps to compare different factors that contribute to solubility limitation.Column A and B in Table 1 compare the measured solubility at pH 6.8 to the extrapolated solubility value, based on intrinsic solubility (column C) and the compound's ionization state.Column D contains the ionization parameter, which is a logarithmic parameter derived from the ionized fraction.The latter, in turn, is calculated taking into account the compound formula (column I).The SL parameter in column F, according to Eq. ( 1), describes the solubility lowering effect according to other factors than log P. One main factor that limits solubility beyond lipophilicity is the contribution of crystal lattice energy, especially for compounds exhibiting rather high melting temperature.In order to obtain guidance in analyzing the solubility limiting factors of a given compound at a defined pH, Table 3 summarizes some guidelines that help interpreting the results.

What factors do influence compound solubility?
The solubility of a compound is mainly influenced by its solute-solvent interaction, its solid-state interaction [13] as well as its ionization state.From these three main factors, the solute-solvent interactions are typically described by the partition coefficient log P [14].The log P value is an intrinsic property of a molecule and thus depends on its chemical constitution.In contrast, solid state properties are accessible to changes of the solid form.Various solid forms comprise the thermodynamically most stable crystalline form as well as further polymorphic forms that may exist, including the amorphous state as well as solvates and hydrates.It should therefore be obvious that the nature of the solid form has to be well characterized in order to understand the solid state contribution to solubility limitation.The ionization state of a compound is defined by its chemical structure and consequently by its pK a values.Generally, the ionization state of a molecule depends on the pK a value of the surrounding solvent, too.In an aqueous environment, its ionization state is a function of the pH value.Thus, for a drug-like molecule, a bio-relevant pH value needs to be defined that is relevant for its solubility at the site of absorption.Here, a pH value of 6.8 is chosen even though a different value might be more appropriate based on the pH gradient in the gastro-intestinal tract or when a non-oral route of administration is considered.

Solid-state and lipophilicity -which one is easier to handle?
Which of the hurdles contributing to solubility limitations can be overcome more easily?In the case of solute-solvent interactions, the only way to impact the log P is during lead optimization.Certainly, the room for maneuver is often limited for the medicinal chemist due to the need to balance molecular properties, especially not to lose potency on the relevant target.In other words, medicinal chemistry will not be able to avoid a certain level of lipophilicity.As other factors will add up which further decrease solubility of a molecule [12], it appears a reasonable strategy to avoid further contributions that negatively affect compound solubility, including solid state contributions.It is worth mentioning that solid-state properties are determined by molecular constitution, as well.Thus, many lead structures favor strong molecular interactions and increasing their solubility by lead optimization will remain a challenging task [15] [16].Despite the fact that the solid state can be impacted by formulation efforts, it has been shown that compounds that show mainly solvation limited solubility appear to have a higher chance to successfully reach the market [17].

Organic hydrates in solubility diagnosis
In the lead optimization phase, solid state properties are typically not evaluated in detail, if at all.Bearing in mind the awareness of solid state contributions to solubility limitation, an attempt was made to understand the impact of such solid form changes on the solubility diagnosis results.As an example, hydrate formation was selected.As the performance of drug compounds is investigated in aqueous environment where the formation of hydrate forms can negatively impact its solubility, the stability of the solid form under investigation may implicate the conclusions drawn on solubility limitation.
For this, selected compounds from in-house lead optimization projects as well as marketed drugs were examined, which all of them are known to form hydrates.For proprietary compounds, hydrate formation has been confirmed by in-depth solid state analysis.Investigation results on Nitrofurantoin and Carbamazepine are publicly available.The following table (Table 4) summarizes the solubility diagnosis for the selected compounds.In addition, the PSA values are included to facilitate discussion of Figure 2. Compound Cpd1.The compound is a base with an experimentally determined log P value of 3.8.Melting point determination by DSC revealed a rather low melting temperature of 97.1 °C.Measured and extrapolated solubility are in agreement, the fraction ionized for this compound is 0.67.Solubility diagnosis for this compound reveals that solubility is mainly limited by lipophilicity, as LS is far below 2 and melting temperature below 200 °C.The impact of the hydrate to solubility limitation is therefore not obvious.
Compound Cpd2.This compound is practically not ionized at pH 6.8, its extrapolated solubility is ten times higher than the value determined by the shake flask method.It's a zwitterion with a log P of 5.0 and the SL value reveals additional solubility limitation by roughly 1 logarithmic unit.However, the melting point above 200°C can be regarded as strong indicator for solubility limitation by crystal lattice forces.
Compound Cpd3.In this example, an acidic compound was investigated.Again, low SL and low melting temperature indicate no significant solubility limitation by solid-state contributions.
Compound Cpd4.In this case, the compound is not ionized, but it is rather insoluble.A quite high SL of 3.0 indicates a thousand times lower solubility than what could be expected based on its lipophilicity (log P 3.1).Again, the melting temperature of 133.6°C does not indicate a strong crystal lattice contribution.
Nitrofurantoin.Solubility of this compound appears to be controlled mainly by other factors than lipophilicity.The compound is characterized by a log P of -0.1 and a quite high melting temperature around 269°C.
Carbamazepine.Carbamazepine is not ionized at pH 6.8, the compound is not very lipophilic, as its log P of 1.7 suggests.Neither the SL parameter of 1.0 nor the melting point at 190 °C necessarily indicate a significant solid-state contribution to solubility limitation.Overall, within the solubility diagnosis parameter set, there is no direct investigation of the nature of the solid state.If a compound has been received in the laboratory as hydrated or generally solvated form, one may realize from DSC measurement that solvent evaporation has taken place or other solid form changes have most likely occurred.More difficult to judge, however, is a potential solid form change during equilibration in aqueous environment.If compounds arrive for solubility diagnosis in anhydrated form, characterization of solid form after equilibration would be necessary to identify changes in solid state.Given the amount of material and the technology needed, such investigations are far beyond the scope of lead optimization activities.Additionally, salts from the buffer solutions complicate solid form investigations after equilibration experiments.Therefore, it would be desirable to classify compounds for their possibility of hydrate formation by other means that require less experimental effort, in order to cope with the time and material constrains during compound optimization phase.Ideally, structure based chemical descriptors would allow at least to point out to a certain possibility of hydrate formation.According to Infantes et al. [18], the probability of hydrate formation is not correlated to the ratio of Hbond donors to acceptors, but to the sum thereof.As the sum of H-bond donors and acceptors is highly correlated to the PSA, the frequency of hydrate formation is found to increase with PSA, too.
In order to review the contributions that H-bond donors and acceptors provide to classify compounds during the lead optimization process without detailed analysis of solid state properties, the currently available in-house dataset of solubility diagnosis was analyzed.It comprises about 200 proprietary compounds, for which the number of known hydrates has been counted.Indeed, analysis revealed that the frequency of hydrate formation increases with increasing sum of H-bond donors and acceptors and thus also with increasing PSA (see Figure 1).
The results are in accordance with the results from Infantes et al. [18], and based on our analysis, an increasing number of hydrated compounds is found when the sum of H-bond donors and H-bond acceptors is greater than 9.According to our dataset, nearly 20% of the compounds with a sum of 10 are found to form hydrates.Although in this dataset the number of compounds with a sum of more than 11 was rather low, the frequency distribution as percentage within the respective binning showed a steady increase with increasing sum of donors and acceptors.However, the total number of confirmed hydrates at this stage of drug development is limited, thus the absolute frequency of hydrate formation is likely to be underestimated.This may explain why the overall percentage of hydrates in our proprietary dataset is lower as compared to results in reference [18].

What can we learn from hydrates about solid state contribution to solubility limitation?
Since the idea of solubility diagnosis is to identify the main factors that contribute to solubility limitation, the in-house dataset was analyzed in regard of the role of PSA for compounds that are either identified as solubility limited by lipophilicity or by crystal lattice forces.PSA is preferred over the sum of Hbond donors and acceptors since it can be easily calculated as a descriptor for QSAR modeling.
Because an increase of the sum of H-bond donors and acceptors, and thus PSA, has been found to lead to a higher frequency of hydrate formation, an additional effect of solid-state contribution with increasing polar interactions could also be suspected.Thus, the role of polar interactions on solubility limiting contributions within our proprietary dataset should be elucidated.For this, compounds that were classified by solubility diagnosis as being mainly limited by either lipophilicity or crystal lattice forces have been analyzed with respect to their PSA values.Table 5 summarizes the average and median values of compounds from these classes.The analysis of both average and median PSA values strongly suggests that compounds with a significant crystal lattice contribution to solubility limitation are characterized by a higher PSA value.
As illustrated by the histogram in Figure 2 (a), the majority of compounds that reveal a significant crystal lattice contribution to solubility limitation are characterized by a PSA of 75 Å 2 or more.In case of compounds with lipophilicity being the main limiting factor, this distribution is shifted towards lower PSA values, see Figure 2 (b).Here, the majority of compounds are characterized by a PSA of less than 75 Å 2 .Notably, only few compounds in the current dataset reveal a PSA of greater than 100 Å 2 in cases where lipophilicity dominates solubility limitation.Although increasing molecular weight leads to increasing PSA, see Figure 3

Conclusions
Solubility diagnosis has proven to be a very useful tool for understanding the solubility limiting factors of drug-like molecules.By comparing contributions from lipophilicity, crystal lattice energy and ionization, the dominant factors that limit solubility are identified.However, the solid state of a compound is typically not well understood during lead optimization efforts.Thus, changes of solid state can significantly impact the conclusions drawn on its contribution to solubility limitation.
In this work, solubility limitations of hydrated forms of drug-like compounds were investigated.It is concluded that the contribution of hydrates to solubility limitation might be underestimated and that hydrates and solvates are special cases that are not necessarily sufficiently characterized during solubility diagnosis.This is due to the fact, that the melting temperature is used as a surrogate for crystal lattice energy.Since hydrates may either decompose during DSC measurements or may transform into anhydrates during heating, melting temperature no longer allows conclusions on solid state contributions.The tendency of a compound to exist in hydrated, or more generally solvated, forms might help to understand the combination of a strong solubility limitation beyond lipophilicity, i.e. a high SL value, and a rather low melting temperature.Therefore, the polar surface area (PSA) was introduced as additional parameter to deal with the possibility of hydrate formation.
Beyond that, analysis of the dataset comprising about 200 proprietary compounds that were run through the solubility diagnosis revealed that the polar surface area is not only a valuable descriptor to estimate the possibility of hydrate formation, but can also provide guidance to distinguish the type of solubility limitation.A cutoff value for PSA around 100Å 2 may be used to classify compounds with respect to their type of solubility limitation.Data analysis has shown that for compounds with a PSA above this value, cases of lipophilicity dominated solubility limitation are rather rare.Instead, solubility limitation by crystal lattice forces is dominant for such compounds, at least in the chemical space represented by the investigated dataset.
F and G If T m >200 or SL >2 then solubility is limited by high crystal lattice energy F and G T m >200 but SL <2 indicates possible supersaturation in the solubility assay I If compound formula = XH then solubility is usually lowest at physiological pH I If compound formula = AH then solubility is low in the stomach

Figure 1 .
Figure 1.(a) Frequency distribution of hydrates vs the sum of H-bond donor (HBD) s and H-bond acceptors (HBA) in the proprietary solubility diagnosis dataset; total numbers.(b) percentage of confirmed hydrates within the respective sum of H-bond donors and acceptors.

Figure 2 .
Figure 2. (a) Frequency distribution of PSA for compounds being mainly solubility limited by crystal lattice fores (b)Frequency distribution of PSA for compounds with solubility mainly limited by lipophilicity.
(a), most of the compounds cluster between a PSA of 60 Å 2 and 120 Å 2 and comprise the full range of molecular weight represented in the dataset.Similarly, compounds with higher PSA by trend are characterized by a lower calculated log P, see Figure3(b).However, there is no strong relationship found within the dataset that would allow classifying the type of solubility limitation based on molecular weight only.However, based on above mentioned findings, it becomes evident that the 'General Solubility Equation' (GSE) can be successfully refined by including the PSA[7].
Molecular weight as a function of PSA for the solubility diagnosis dataset.(b) Calculated logP as a function of PSA for the solubility diagnosis dataset

Table 1 .
Parameters generated for solubility diagnosis

Table 2 .
Description of parameters used for solubility diagnosis

Table 3 .
Guidelines to analyze the solubility diagnosis dataset

Table 4 .
Examples of hydrates in solubility diagnosis

Table 5 .
Average and median values of PSA for compounds classified by type of solubility limitation doi: 10.5599/admet.2.2.36 123