Importance of NMR refinement

Tertiary structure of protein provides crucial information of its biological functions. The structure determination has been performed by the X-ray crystallography or the solution state nuclear magnetic resonance (NMR) spectroscopy. The NMR provides information about the structural dynamics of protein such as folding transition. In spite of these features, NMR structure is still limited for structural analysis because of their poor quality. For example, as shown in figure 1(left), original NMR structure has many clashes (unfavorable all-atom steric overlaps) and poor quality of Ramachandran plot. These problems can be solved by refinement. After refinement, in refined structure, clashes are removed and Ramachandran plot appearance also improved (many dots are located in favorable region)(Figure 1).

Figure 1. Comparison of structural quality before and after refinement.

Process of refinement


Figure 2. Flow chart of refinement process

1-1) Seperation of ensemble conformers
  • Ensemble conformers of initial NMR structure is separated. All of separated ensemble conformers are used in refinement.

2-1) Generating New STAP
  • The generated by expanded Torsion angle combination has a positive effect on generating secondary structure and improvement of protein packing quality. Detailed information is described in 'Introduction of new method' section.

2-2) NOE is available?
  • In STAP2, two type of structural restraints are used to refinement ('NOE' and 'Dist'). If initial structure has NOE distance restraint which is available for refinement, 'NOE' is generated with 'Dist'.

2-3) Generating 'Dist'
  • Instead of NOE, distance restraint derived from inter-hydrogen interaction of initial NMR structure is used to avoid the distorsion while refinement. Detailed information is described in 'Introduction new method' section.

2-4) Generating 'NOE'
  • The 'NOE' is restraining potential of CHARMM and is used during refinement to keep the distance between two atoms defined in NOE(Nuclear Overhauser Effect) data obatained from BMRB (Biological Magnetic Resonance Bank). From BMRB, NOE data is separated according to the distance in amino acid sequence between the interacting nuclei : intra-residue NOE (|i-j|=0, where i and j are residue numbers), sequential NOE (|i-j|=1), medium NOE (|i-j|=2~4), and long NOE (|i-j|>=5).

3-1) Simulated Annealing(SA) using CHARMM
  • For global optimized energy of protein structure, simulated annealing(SA) is performed with generated potentials.

    Figure 3. Process of Simulated Annealing (SA)

    SA is carried out in CHARMM with solvation (EEF1.1) and CHARMM default energy according to the following steps :

    1. Short energy minimization with 100 steps.
    2. The system is heated from 100 to 500 K with 1600 steps.
    3. Annealing with 2000 steps were performed at 500 K with molecular dynamics.
    4. Slow-cooling from 500K to 25K followed by 4000 steps.
    5. 100 steps of energy minimization is performed again.

3-2) Structural analysis of refined structues
  • Improvement in refined structure is validated through various structural analysis. Structural difference between original and refined structure is compared by 3D structure viewer(Jsmol) and secondary structure scheme. And geometrical and stereo-chemical quality of protein structure is measured by structure validation scores.
    Detailed information is described in 'Structure validation scores' section.

4-1) Showing refinement result through web page
  • After refinement, refined structure and structural analysis data is provided to user through STAP2 web page.

Introduction of new method

1. New STAP (Statistical Torsion Angle potential)


Figure 4. Generating the new STAP

STAP is grid type knowledge-based energy potential generated by torsion angle combinations. The torsion angle populations are individually collected for 4 torsion angle combinations of φ-ψ,φ-χ1,ψ-χ1 and, χ12 where each combination set consists of 21 types of functions for 20 normal amino acid and pre-proline. In STAP2, torsion angle combination was extended from 2(φ-ψ) to 4 torsion angle combinations(φ-ψ,φ-χ1,ψ-χ1 and, χ12) for considering the side chain.

Reference : "Statistical torsion angle potential energy functions for protein structure modeling: A bicubic interpolation approach (Proteins, 2013)"

2. 'Dist' : Distance restraint

'Dist' is distance restraint obtained computationally from initial NMR structure.


Figure 5. Generating the 'Dist'

1) Why we use 'Dist?'
  • Inherent characteristics of NOE data
    The ambiguity in NOE distance data is one of the main problems with NMR structures; this arises because the NOE signal is weak, and peak picking is difficult during structure determination/refinement processes. For that reason, NMR structure with such NOE data is excluded from refinement target.
    To overcome these limitation, we used NOE like flat-bottom distance potential ('Dist') generated from hydrogen distance of initial NMR structure.

2) What is 'Dist'?
    The 'Dist' is potential which has flat-bottom shape and is generated with inter-hydrogen distance interactions that paired within 7Å obtained from initial NMR structure. It consisted of two parameters that are the equilibrium distance of two paired hydrogen atoms and flat-bottom width of 4Å which obtained from previous study. A role of the flat-bottom distance potential prevents the excessive structural dislocation from original state during refinement simulation.

    Reference : "Protein NMR Structures Refined without NOE Data (PLOS ONE, 2014)"

Structure validation scores

Three protein energy properties

DOPE [1]
(Discrete Optimized Protein Energy)
The DOPE score is an atomic distance-dependent statistical potential based on a physical reference state that accounts for the finite size and spherical shape of proteins. In a protein structure globally or locally lowest DOPE score suggest the best model or fold space modeled.
*Negative value indicates better structure.
nDOPE [1]
(normalized DOPE)
nDOPE is a standard score (Z-score) derived from the statistics of raw DOPE scores [2].
*Negative value indicates better structure.
dDFIRE [3]
(dipolar Distance-scaled,
Finite-Ideal gas REference)
dDFIRE energy is function based on the orientation angles involved in dipole-dipole interactions.
*Negative value indicates better structure.

Detection of overlaped atoms

Clash [4] Clash score is defined as the number of unfavorable all-atom steric overlaps ≥0.4Å per 1000 atoms. When two neighbored atoms are too close to each other, the protein energy is increased. Therefore, unfavorable steric clashes are strongly correlated with poor data quality. It measured by MolProbity.
*Low value indicates better structure.

Validation for geometrical/stereochemical quality of protein structure

STAP2 provides various geometrical/stereochemical quality score of structure with MolProbity[4], PROCHECK[5] and WHAT_CHECK[6].
All scores in this category are measured by WHAT_CHECK except 'ramaM' and 'ramaP'. WHAT_CHECK is a tool combining a number of different stereochemical and geometric properties of the structure. Global stereochemical quality parameters are checked and scored in the form of Z-scores [7].
*Positive value indicates better structure.

1st/2nd pack 1st/2nd packing quality determines the quality of physics-chemical knowledge and measures the overall biochemical packing or the atomics interconnectivity quality.
Ramachandran plot appearance In result page, 'Structure validation scores' table includes 'ramaM', 'ramaP' and 'ramaW'. The 'ramaM', 'ramaP' and 'ramaW' are acronomy for Ramachandran plot appearnce measured by MolProbity, PROCHECK and WHAT_CHECK, respectively.
Ramachandran indicates how well the torsion angle combination is distributed in polypeptide chain of protein structure.
'ramaM' and 'ramaP' represent the favored conformational region as defined by each of programs. 'ramaW' indicates Ramachandran plot appearance of corresponding Z-score of WHAT_CHECK.
rotamer The rotamer normality assessed distributions of side chain dihedral angle.
backbone The backbone conformation are compared with database structures using C-alpha superpositions with some restraints on the backbone oxygen positions.
*for more details → Click!

Comprehensive assessment of NOE restraint violations

Any distance exceeding the upper bound of the restraint is called an NOE violations. This analysis includes number of violated NOE restraints, maximum NOE restraint violations, and RMSD NOE restraint violations. The number of violated NOE is counted in bins (above 0.0/0.5/1.0/2.0Å). The maximum value of NOE restraint violations indicates maximum exceeded distance of the upper bound of NOE restraint. The RMSD value of exceeded distance of NOE upper bound is reported. For example, in the 'Structure validation scores' table, 0.0/0.5/1.0/2.0 (maximum NOE restraint violations); RMSD NOE restraint violations.

[1] Shen,M.Y. and Sali,A. (2006). Statistical potential for assessment and prediction of protein structures. Protein Sci., 15, 2507-2524.
[2] Eramian, David, et al. (2008). How well can the accuracy of comparative protein structure models be predicted?. Protein Science., 17.11, 1881-1893.
[3] Chen,H. and Kihara,D. (2008). Estimating quality of template-based protein models by alignment stability. Proteins, 71,1255-1274.
[4] Davis,I.W., et al. (2007). MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res., 35, W375-W383.
[5] Laskowski,R.A., Rullmannn,J.A., MacArthur,M.W., Kaptein,R. and Thornton,J.M. (1996). AQUA and PROCHECK-NMR:programs for checking the quality of protein structures solved by NMR. J. Biomol. NMR, 8, 477-486.
[6] Hooft,R.W., Vriend,G., Sander,C. and Abola,E.E. (1996). Errors in protein structures. Nature, 381, 272.
[7] Saccenti, Edoardo, and Antonio Rosato. (2008). The war of tools: how can NMR spectroscopists detect errors in their structures?. Journal of biomolecular NMR 40.4., 251-261.