Lipophilicity determination of acidic compounds : MEEKC as a reliable high-throughput methodology

In the present study a pressure-assisted MEEKC method with reversed-polarity using a conventional CE instrument with UV detection and uncoated fused silica capillaries is validated as a high-throughput methodology for the lipophilicity determination of the neutral species of acidic compounds (pKa > 3.5). After the calibration of the system with four standard compounds of known log Po/w, mass distribution ratios (log kMEEKC) of new molecules can be directly converted into log Po/w values by means of a simple linear equation (log Po/w=a·log kMEEKC+b). The method was internally and externally validated for a log Po/w range between -1.54 and 4.75, with higher accuracies than conventional liquid chromatographic methods.


MEEKC as high-throughput surrogate model for the determination of lipophilicity
According to IUPAC [1], lipophilicity represents the affinity of a molecule or a moiety for a lipophilic environment that is commonly measured by its distribution behavior in a biphasic system.Since lipophilicity plays a fundamental role in the processes of absorption, distribution, metabolism, excretion, and toxicity (ADMET) of chemical compounds in biological systems, it is a relevant physicochemical property to be determined in the drug discovery and design process [2].Different lipophilicity indexes can be obtained depending on the particular biphasic system used, but the most widely used is the n-octanol/water partition coefficient (log P o/w , also indicated as log K o/w ).Moreover, since lipophilicity is a critical parameter for chemical safety assessment, according to the REACH Regulation ((EC) No 1907/2006) log P o/w must be reported for any organic compound produced in quantities of one tonne or more per year.Thus, two test procedures are described in the Test Methods Regulation ((EC) No 440/2008): a direct measurement via shake-flask methods [3] and a correlation approach by means of HPLC [4].However, other experimental methods can be used provided that they show an acceptable level of quality assurance [5].This is clearly the case of dual-phase potentiometric titration procedures [6,7], commonly used in pharmaceutical research for ionizable drugs with pK a values in a measurable pH range (2)(3)(4)(5)(6)(7)(8)(9)(10)(11)(12), which provide reliable and accurate log P o/w values [8].
Several methods based on chromatographic retention have been proposed in order to measure lipophilicity, mainly using reversed-phase columns and acetonitrile/aqueous buffer mobile phases [9].These approaches are much more time-efficient compared to shake-flask and potentiometric methods, with the additional benefit of a separation technique that does not require high purity samples.However, the accuracy of the estimated log P o/w values depends to a great extent on the similarity between the calibration standards and the sample compounds.For instance, octanol and C18 phases exhibit a different interaction with hydrogen-bond donor solutes; in the chromatographic system solutes with H-bond acidity are discouraged from partitioning out of the hydro-organic mixture and into the poorer H-bond acceptor C18 phase, whereas octanol (either containing water or other H-bond acceptor moieties) can form hydrogen bonds.Therefore, the introduction of molecular descriptors, either experimental or calculated, allows better correlations between chromatographic retention and log P o/w values [9][10][11][12], but is limited to unionized molecules.However, a lower accuracy from chromatographic methods (±0.5 log units [4]) can be expected in relation to the shake-flask method (±0.3 [3]).
In previous papers [13,14] an approach based on microemulsion electrokinetic chromatography (MEEKC) was successfully demonstrated as an indirect method for log P o/w determination for compounds of pharmaceutical interest, using UV detection or hyphenated to an MS with an atmospheric pressure photoionizaion (APPI) source [15].MEEKC is a chromatographic technique (microemulsion (ME) droplets acting as pseudo-stationary phase), and requires significantly shorter analysis times than reference shakeflask and potentiometric methods, avoiding in addition the requirement of high-purity samples.Although it can only be applied to neutral species, the broad pH range of application (from pH 2 to 12) allows, in most cases, the finding of experimental conditions which ensure the analyte is in its neutral form.This is clearly an advantage over conventional chromatographic methods, since column stability might be a critical factor at such extreme pH values.In addition, and this is maybe the most significant advantage over traditional chromatographic methods, MEEKC measurements can be accurately correlated with log P o/w without the need of molecular descriptors since ME are better surrogates of n-octanol/water systems than C18 stationary phases.Additionally, running costs of capillary electrophoresis techniques are lower because of the lower price of fused silica capillaries in relation to chromatographic columns and the lower solvent consumption.Last but not least, MEEKC is a very robust technique for log P o/w determination, since variations in the ME composition (pH, buffer nature, surfactant type and concentration, and even the addition of organic modifiers) [14] were shown to produce insignificant changes in its predictive capacity.
In 2000 Poole and co-workers [16] published a study about lipophilicity determination by MEEKC that included acidic compounds using a running buffer of pH 3, and employing sulfonated silica capillaries in order to provide an adequate electroosmotic flow at such an acidic pH.In a previous study involving buffers over a wide range of pH values [14], we proposed a seminal pressure-assisted method with reversed polarity allowing measurements to be made at pH 2 and using conventional uncoated fused silica capillaries, which are much less expensive than the sulfonic acid coated capillaries.In this paper, a systematic study of structurally diverse compounds is presented, proposing a high-throughput validated method for the lipophilicity determination of acids with pK a values as low as 3.5.
The partition of analytes between the aqueous bulk solvent and the pseudo-stationary oil phase in the ME is measured by the logarithm of the mass distribution ratio (log k MEEKC ) [17]: where t R and t EOF are the migration times of the analyte and the electroosmotic flow (EOF) marker (e.g.DMSO), respectively, and t ME is the migration time of a very lipophilic compound used as ME marker (e.g.dodecanophenone).Thus, log k MEEKC values obtained for unionized species can be used to estimate their lipophilicity according to the following linear equation: where a and b are the slope and intercept, respectively, of the linear regression calculated by the least square method.

Internal and external validation of the method
The ability of the model to reproduce the data included in the set of compounds (the goodness-of-fit) is measured by the determination coefficient of the model (R 2 ), which in our case estimates the proportion of the variation in the predicted log P o/w that can be explained according to the model.The robustness of the model can be assessed by cross-validation, in which a number of compounds are iteratively excluded from the set of substances used for model development and the results then allow the prediction to be made for the left-out compounds.Thus, the cross-validated correlation coefficient (Q 2 ), which is a measure of the goodness of the internal power to predict, can be calculated by the formula: where i y is the observed response (literature log P o/w ) for the i th object, / ˆii y is the response of the i th object estimated by the model obtained without using the i th object (log P o/w from log k MEEKC measurements), and y is the mean value of observed responses for the n elements of the complete data set.The size of the group of excluded chemicals at every step is normally in the range of only one (leave-one-out, LOO) to 50 % of the whole set of compounds (leave-many-out, LMO).High mean Q 2 values in LOO (Q 2 LOO ) and LMO (Q 2 LMO ) validations (> 0.7) are necessary but are not sufficient conditions for a model to be robust and internally predictive [18].
The external validation is performed by splitting the whole data set into two different sets, a training set used for method development and a test set for the assessment of the predictive capacity.Predicted values are correlated with the experimental ones, and the linear regression is expected to be as close as possible to that of unity slope and null intercept.This closeness can be calculated as (R 2 -R 0 2 )/R 2 [19], where R 0 2 is the determination coefficient of the regression line forced to pass through the origin, or by means of the concordance correlation coefficient (CCC) [20], which can be calculated by the formula: )/R 2 < 0.1 [19], or alternatively if CCC ≥ 0.85 [20].

Instrumentation
A G1600 capillary electrophoresis (Agilent, Waldbronn, Germany) with UV detection and polyimide coated capillaries of 50 μm id, 375 μm od and 57.0±0.1/48.5±0.1 cm of total/effective lengths (Polymicro Technologies, Phoenix, USA) were used.The cassette temperature was set to 25 °C (forced air) and samples were injected hydrodynamically by application of a pressure of 50 mbar for 10 s.Separations were carried by out applying a voltage of -24 kV (inlet, cathode; outlet, anode) and an external pressure of 50 mbar (on the inlet vial).Current intensities were typically in the range between 30 and 40 µA.Capillary preconditioning was performed by ME for 2 min and postconditioning by 1 M sodium hydroxide and water for 2 min each.pH was measured with a Crison GLP 22 pH meter (Barcelona, Spain) using a 5014 combination glass electrode and a reference electrode with a 3.0 mol L −1 KCl solution in water salt bridge.MEs were sonicated in a J.P. Selecta (Barcelona, Spain) ultrasonic bath at a power of 360 W.

Microemulsion and sample preparation
20 mM aqueous buffers were prepared from phosphoric acid by adjusting the pH to 2.0 by the addition of small volumes of a 3 M sodium hydroxide solution prepared shortly before use.1.3% w/v of SDS was dissolved in the aqueous buffer at room temperature and stirred by magnetic stirrer until a transparent colorless solution was obtained, and the pH then adjusted if necessary.Afterwards, 1-butanol (Sigma-Aldrich, ≥ 99.4%) was added up to 8.15% (v/v), followed by heptane (Sigma-Aldrich, ≥ 99%) up to 1.15% (v/v).Both organic solvents were slowly added with a burette.At this point, the solution became white and turbid.Magnetic stirring was maintained for 5 minutes and then the ME was sonicated until it became clear again.Finally, the solution was left to stand at room temperature for at least 1 hour.Immediately before use the ME was filtered using a 0.45 μm nylon syringe filter (Simplepure, Membrane-Solutions, USA).Sample solutions were prepared by dissolving the ME marker (dodecanophenone, 0.5 mg/mL) directly into the ME by sonication, followed by the addition of the EOF marker (DMSO, 0.1% in volume) and the analytes (0.5 mg/mL from a stock solution of 10 mg/mL in methanol).

Method development: pressure-assisted MEEKC with reversed polarity using uncoated fused silica capillaries
Firstly, it was necessary to establish the experimental conditions (pH, SDS content, capillary length, applied voltage and pressure) to allow the determination of a wide range of lipophilicity values in relatively short analysis times.Thus, mixtures of substances with known log P o/w values were injected in order to test the system behavior.These compounds were N,N-dimethylacetamide, 1-phenylthiourea, acetophenone, butyrophenone, propylbenzene, and pentachloronitrobenzene, and their corresponding measured log P o/w values are -0.77,0.73, 1.58, 2.66, 3.72, and 5.10, respectively [21].These compounds were easily detected by UV and they behave as neutral species nearly over all of the pH range.Only 1-phenylthiourea could behave as an ampholyte, but basic and acidic groups were expected to be too extreme to cause a significant ionization at pH 2.0 (their calculated pK a values are 0.7 and 13.1, respectively [22]).Under the conditions of assay in the present work, pentachloronitrobenze was too lipophilic to be sufficiently resolved from the ME marker (dodecanophenone).It must be pointed out that at very acidic conditions the direction of the EOF is found to be reversed in relation to neutral and alkaline running buffers.Therefore, in order to overcome this EOF issue the instrument polarity was reversed, with the anode in the destination vial (outlet) and the cathode in source vial (inlet).Thus, the negatively charged ME droplets (because of the SDS) were the first to reach the detector window, while the EOF marker migrated last.The application of an external pressure, which pushes the ME filling the capillary towards the detector, is fundamental to achieving a good separation of the standard mixture in relatively short analysis times.

Selected set of substances
A set of 51 acidic substances, structurally different, was selected for the validation of the proposed method (Table 1).They covered a wide region of chemical space, with lipophilicity values covering 5 log P o/w units (between -1.5 and 4.5) and calculated pK a values in the range between 3.3 and 13.6.Stronger acids had to be excluded, since the ME employed had a pH value of 2.0 and this approach required the solutes to be in their neutral form (a molar fraction of 5 % of ionized species was set as threshold).

Internal validation
As shown in Figure 1, there was an excellent correlation between experimental log P o/w values found in the literature and log k MEEKC measurements performed in the present work.The model successfully explained 92.75 % of the log P o/w variation in the selected set of compounds, and the summarized overall error in the prediction of the model (0.38) is only slightly higher than the admitted differences from log P o/w replicates by the shake-flask methods [3,24].Only three compounds presented higher errors than two times the standard error of the fitting; these were 2,4-dithiouracil, hydroxypropyltheophylline, and morin.In all cases, predicted log P o/w values were higher (1.2, 0.9, and 2.5, respectively) than those found in the literature.Since the validated model presents only one independent variable (log k MEEKC ), as expected the crossvalidated correlation coefficients Q 2 LOO (0.9211) was only slightly lower than the fitting parameter R 2 (0.9275), being all of them significantly much higher than the 0.7 threshold.The mean Q 2 LMO , which was calculated from 2000 iterations excluding randomly 50 % of the chemicals of the set, presented a value of 0.915 with a standard deviation of 0.023, being thus very close to Q 2 LOO and with a low dispersion.Therefore, internal validation demonstrated that the model was stable and internally predictive, and thus ready for the external validation step.

External validation
The application of this method for log P o/w determination requires first the calibration of the response of the electrophoretic system according to equation (2).Thus, it would be convenient to define a small set of standards in order to allow system calibration using a single injection of a mixture of these compounds.With the aim of finding a suitable set, the three compounds identified as possible outliers (2,4-dithiouracil, hydroxypropyltheophylline, and morin) were left out and a new linear regression log P o/w vs. log k MEEKC was calculated.Four compounds covering a good range of lipophilicities and showing very little residuals were then selected as candidates for the calibration curve and, consequently, as the training set for the external validation.The chosen analytes and the calibration plots obtained with the equation are shown in Figure 2.
After developing the model with the training set, the 44 remaining compounds (excluding the three outliers) were used as a test set.In a strict sense this was not a proper external validation, since the chemicals belonging to the test set had been included in the previous step leading to the selection of the four training set compounds, thus not being completely new molecules.However, the aim of this validation was to prove the predictive capacity of the standard compounds selected for the calibration of the lipophilicity response of the MEEKC system.As shown in Figure 3, there was an excellent correspondence between predicted and observed log P o/w values, the slopes of the normal regressions and that forced to the origin being not different from 1.Moreover, (R 2 -R 0 2 )/R 2 presented a value of 0.002, suggesting that the origin ordinate was not significantly different from 0. In addition, the value for CCC was 0.974, pointing out the very good accuracy of the model in terms of precision (scattering of observation in relation to the fitting line) and trueness (closeness of the regression to the full correspondence represented by a line of slope 1 and intercept 0).

Conclusions
In contrast to HPLC methods, MEEKC measurements can be accurately correlated with log P o/w without the additional need of molecular descriptors.Thus, a high-throughput pressure-assisted MEEKC methodology with reversed polarity for log P o/w determination of acidic compounds (pK a > 3.5) has been proposed and validated (internally and externally), using a conventional CE instrument with UV detection and uncoated fused silica capillaries.3-methylbenzoic acid, phenobarbital, barbital, and thiouracil are proposed as calibration standards, allowing the measurement of log P o/w values in the range between -1.54 and 4.75 with a prediction accuracy of ±0.4.
)where x and y correspond to the abscissa and ordinate values of the correlation plot, x and y are mean values and n is the number of compounds in the test set.The main advantage of CCC is the independence of the closeness value in relation to the disposition of the axes.Thus, a model can be assessed as externally predictive if R 2 and R 0 2 are close to 1, slopes of linear regressions are in the range between 0.85 and 1.15, and (R 2 -R 0 2

Figure 2 .
Figure 2. Electropherogram of the mixture of compounds selected as training set for the external validation and the calibration plot used as model development (dashed line).

Figure 3 .
Figure 3. Correlation between observed and predicted log P o/w from external validation.Dashed line represents the regression forcing the null origin.

Table 1 .
[21]rimental log P o/w from the literature[21]and the calculated pK a values (GALAS [22]) of the acidic compounds included in the study, together with the estimated molar percentage of the neutral species at pH 2.0.