Protein Cavity Analysis

(Combining Digital & Computational Geometries)

Algorithm used:

Dinesh Billa, Partha Bhowmick, and Jayanta Mukhopadhyay. Cavity Analysis of Large Atomic Structures by Cascaded Discretization and Geometric Searching, 2011 World Congress on Engineering and Technology (CET), Oct. 28-30, 2011, Shanghai, China (to appear in IEEE Proceedings).

Abstract: A novel algorithm to find internal cavities in large protein structures is proposed. Protein space discretization based on Euclidean distance transform and medial axis transform are used in tandem with geometric searching, which make the algorithm very efficient and robust. Potential docking regions can thus be obtained by probes of predetermined size in the protein space. While computing MAT, we hit memory limitations due to the digitization of 3D space while working with the best possible precision of protein atoms (0.001Å). Hence, our algorithm adopts a multi-resolution/cascading approach to analyze the protein first at lower resolution and gradually with higher resolutions based on the distance-transform variance. Experimental results on benchmark datasets demonstrate its strength and efficiency.
Shown aside: Cavities (shown as green spheres) detected by our algorithm on 2HHB with probe radius: 1.50Å. Protein atoms are shown in gray.

Our contribution:

Cavity analysis of proteins is a fundamental task to study their properties related with docking, folding, and stability. The problem of cavity analysis is quite challenging with the increase in size and complexity of the protein space. We have designed an efficient algorithm for internal cavity analysis in large protein structures using several novel techniques. As a first step, we have done protein space discretization and the Euclidean distance transform of the discretized space. Next, we have computed medial axis transform (MAT) to obtain the set of medial-axis points representing the internal cavity. Geometric searching by 3-dimensional kd-tree is used for several purposes, such as finding the nearest atoms corresponding to a medial-axis point, detecting the disjoint cavity regions by BFS over the medial-axis points, and to identify potential docking regions in a protein. Ours is a simplified version of molecular docking and does not incorporate factors such as chemical interactions, bonding energy, conformations between the protein and the molecule to be docked. While computing MAT, we hit memory limitations due to the digitization of the space when the best possible precision of protein atoms (0.001Å) is in question. Hence, our algorithm adopts a cascading (multi-resolution) strategy to initially analyze the protein at lower resolution and later in higher resolutions based on the distance-transform variance.

Digital-geometric analysis:

We have used both Euclidean DT and Chamfer-based DT. For Chamfer DT, different values of (a, b, c) approximate the Euclidean metric. We have implemented an algorithm for computing the exact squared-Euclidean DT and the method of six-scans for Chamfer DT with (13, 17, 23) as the chamfer mask.
       The concept of octagonal distance in digitized space owes from digital geometry. As it is impractical to work with generalized octagonal distances with infinite periodicity, one is usually concerned with finite cyclic sequences. The medial-axis transform (MAT) is a complete shape descriptor and depends on the distance function used. Determining the minimal medial axis, however, is NP-hard. We have implemented a MAT algorithm using (1, 1, 3) as the octagonal distance. A few screen-shots of the above on sample proteins are shown below. In this figure, the medial axis (digitized, green) is based on Euclidean metric (top) and Chamfer metric (bottom); Left: without atoms; Right: with atoms (in red).

   

   


Some results:

Results of cascaded discretization followed by geometric searching for cavity finding (shown as colored spheres) on some typical proteins: 2HHB, 1G40, 2A01 (from top to bottom). Probe radius: 1.50Å (odd row) and 2.50Å (even row). All protein atoms are shown in gray.







© Partha Bhowmick (June 2011)