A representation that integrates the analyses of the structural similarity of and potency differences between compounds sharing the same biological activity
A pair of structurally “similar” compounds with “large” differences in potency.This is an intuitive concept for a medicinal chemist, and corresponds to the exceptions of the “similarity principle” or neighbourhood behaviour, assuming that similar structures have similar properties.
The characterization of activity landscapes is performed by visual exploration with the help of SAS maps, network graphs, or by quantifying the relationship between the chemical similarity and activity similarity. The activity similarity is usually defined by absolute differences between activities, or absolute differences, normalized by the activity range: SAR Index ,SALI index.While it has been argued that the activity cliff concept is not applicable to properties beyond receptor interaction , the techniques of detecting discontinuities in SAR landscapes are potentially useful in modelling any chemical property, even though the reason for the cliffs existence may be different.
Events | `s` (high similarity) | `!s` ( low similarity) |
---|---|---|
`t` (large activity difference) | `a ~ P(s|t)` | `b ~ P(!s|t)` |
`!t` (small activity difference) | `c ~ P( s| !t)` | `d ~ P (!s | !t)` |
`G^2 = alog((a(c+d))/(c(a+b))) + blog((b(c+d))/(d(a+b))) `
The `G^2` statistics is used in natural language processing as a measure of words co-occurrence. In our case, `G^2` represents the likelihood of a compound forming an activity cliff, which is defined by a large difference in activity (event `t`) with other compounds in the dataset, given high similarity (event `s`). To calculate the activity cliff likelihood, one has to define what is considered a large difference in activity (i.e. an activity threshold), and what is considered a high similarity (i.e. a similarity threshold). Once the thresholds are defined, the 2x2 contingency table (Table 1) is prepared by comparing the compound with all other compounds in the analyzed dataset and incrementing the relevant count`G^2` rank | ID | `a` | `b` | `c` | `d` | Activity | `G^2` |
---|---|---|---|---|---|---|---|
1 | 2 | 216 | 0 | 310 | 50 (inactive) | 32.34 | |
2 | 1 | 310 | 1 | 216 | 5.84 | 0.07 | |
3 | 1 | 308 | 1 | 218 | 10.90 | 0.07 |
Generated from the Sutherland DHFR dataset DOI: 10.1021/ci034143r