|
Protein Cavity Analysis
(Combining Digital & Computational Geometries)
Algorithm used:
Dinesh Billa, Partha Bhowmick, and Jayanta Mukhopadhyay.
Cavity Analysis of Large Atomic Structures
by Cascaded Discretization and Geometric Searching,
2011 World Congress on Engineering and Technology (CET),
Oct. 28-30, 2011, Shanghai, China (to appear in IEEE Proceedings).
Abstract:
A novel algorithm to find internal cavities in large
protein structures is proposed.
Protein space discretization based on
Euclidean distance transform and
medial axis transform
are used in tandem with geometric searching, which make the
algorithm very efficient and robust. Potential docking regions can
thus be obtained by probes of predetermined size in the protein
space. While computing MAT, we hit memory limitations due to
the digitization of 3D space while working with the best possible
precision of protein atoms (0.001Å). Hence, our algorithm adopts
a multi-resolution/cascading approach to analyze the protein
first at lower resolution and gradually with higher resolutions
based on the distance-transform variance. Experimental results
on benchmark datasets demonstrate its strength and efficiency.
Shown aside: Cavities (shown as green spheres) detected by our algorithm
on 2HHB with probe radius: 1.50Å. Protein atoms are shown in gray.
|
Our contribution:
Cavity analysis of proteins is a fundamental task to study
their properties related with docking, folding, and stability.
The problem of cavity analysis is quite
challenging with the increase in size and complexity of the
protein space. We have designed an efficient algorithm for
internal cavity analysis in large protein structures using several
novel techniques. As a first step, we have done protein space
discretization and the Euclidean distance transform of the discretized space.
Next, we have computed medial axis transform
(MAT) to obtain the set of medial-axis points representing the
internal cavity. Geometric searching by
3-dimensional kd-tree
is used for several purposes, such as finding the nearest atoms
corresponding to a medial-axis point, detecting the disjoint
cavity regions by BFS over the medial-axis points, and to
identify potential docking regions in a protein.
Ours is a simplified version of molecular docking and does
not incorporate factors such as chemical interactions, bonding
energy, conformations between the protein and the molecule to
be docked. While computing MAT, we hit memory limitations
due to the digitization of the space when the best possible
precision of protein atoms (0.001Å) is in question.
Hence, our algorithm adopts a cascading (multi-resolution)
strategy to initially analyze the protein at lower resolution
and later in higher resolutions based on the distance-transform
variance.
|
Digital-geometric analysis:
We have used both Euclidean DT and
Chamfer-based DT.
For Chamfer DT, different values of (a, b, c) approximate
the Euclidean metric.
We have implemented an algorithm for computing the exact squared-Euclidean DT
and the method of six-scans for Chamfer DT
with (13, 17, 23) as the chamfer mask.
The concept of octagonal distance in digitized space owes from digital geometry.
As it is impractical to work with generalized octagonal distances with
infinite periodicity, one is usually concerned with finite
cyclic sequences.
The medial-axis transform (MAT) is a complete shape descriptor and depends on the
distance function used.
Determining the minimal medial axis, however, is NP-hard.
We have implemented a MAT algorithm using (1, 1, 3) as the octagonal distance.
A few screen-shots of the above on sample proteins are shown below.
In this figure, the medial axis (digitized, green) is based on Euclidean metric (top) and Chamfer metric (bottom);
Left: without atoms; Right: with atoms (in red).
|