Overall statistics


This page shows various statistical analysis of structures deposited in STAP2. Structural quality of before and after refined structures is assessed with widely used criteria. Moreover, refined structure's quality of STAP2 is compared with refined structures of other known refinement DB(DRESS and RECOORD).




Summary of Datasets

Figure 1. Venn diagram of datasets

Whole NMR refinement
  • The STAP2 contains 10,593 refined NMR structures. Whole NMR structures are refined by the 'Dist' method that is newly employed NOE-like distance restraint of STAP2 (Detailed description is explained in 'Introduce a new method' section of 'Method' page.)
NMR refinement with NOE data
  • In whole 10,593 structures, 3,360 structures have NOE experiment data and these structures are refined with two types of distance restraint, 'Dist' and 'NOE' (More information is described in flow chart of 'Method' page).
NMR refinement with pre-refined structures
  • The 754 structures are pre-refined with AMBER or CHARMM before published in the PDB. With these structures, we compared structural quality of pre-refined and re-refined(STAP2) structures.
Comparison with other methods(DRESS and RECOORD)
  • Additionally, the performance of STAP2 is compared with well-known NMR refinement approaches (DRESS and RECOORD). The STAP2, DRESS and RECOORD have a common 86 structures in whole structures, these structures are used to performance comparison.



Statistics of structure validation scores : Whole STAP2 refined structures (10,593 structures)


List of 10,593 structures (PDB ID+Chain ID)

Structural quality of whole before and after refined NMR structures in STAP2 is validated through various structure validation scores. Whole 10,593 structures are refined with 'Flat' that is used to avoid discrepancy of structure during refinement instead of NOE distance restraint.


Table 1. Statistics of structure validation scores

Structure validation scores Before refinement After refinement ('Flat')
Steric Clash score 34.13 ± 54.210.33 ± 1.31
Three protein energy properties DOPE-7069 ± 5237-7519 ± 5413
nDOPE-0.37 ± 1.32-0.78 ± 1.20
dDFIRE-141 ± 140-157 ± 109
Ramachandran plot appearance MolProbity83.11 ± 12.0395.68 ± 4.74
PROCHECK75.39 ± 14.6591.14 ± 7.40
WHAT_CHECK
Structure Z-score Distribution*
1st packing quality-3.36 ± 2.40-2.88 ± 2.44
2nd packing quality-2.63 ± 1.78-1.59 ± 2.12
Ramachandran plot appearance-4.41 ± 2.231.78 ± 2.01
Rotamer Normality-5.08 ± 2.601.55 ± 2.28
Backbone Conformation -1.02 ± 1.62-0.94 ± 1.55

*Bold font indicates the best scores.

As shown in Table 1, all structure validation scores of refined structures were substantially improved. In particular, the steric clash score, MolProbity/PROCHECK Ramachandran plot appearance, and rotamer normality are significantly improved.




Figure 2. Distribution of the structure validation scores of the original (red) and refined structures (blue bars)


The histogram represents the frequency of structures distributed in the intervals of each score which are divided by assigned bin size. The energy scores(DOPE, nDOPE, and dDFIRE) and clash score are good to have a high frequency at the low value. The others are vice versa.

As shown in Figure 2, five structure validation scores that showed significant results have a high frequency at better range of each score. Especially, the WHAT_CHECK Ramachandran plot appearance and rotamer normality showed a large improvement because bars of refined structures are shifted to better range. Value of Silhouette index in each histogram was used to show the difference between original and refined structures. It measures that how well two groups (original and refined structures) are separated. If the two groups are well separated, value of silhouette index is close to 1, otherwise it is close to -1. We calculated mean of the five silhouette indexes, each of which was obtained by random sampling 1,000 pairs of the raw scores from the two groups. In terms of the average silhouette width, distributions of WHAT_CHECK and Rotamer normality of refined structure were shift toward better range as 0.57 and 0.55, respectively.


Consideration of S-S bond

The S-S bond play an important role in the folding and stability of tertiary structure of proteins and is formed between the thiol groups of cysteine residues by the process of oxidative folding ( Wikipedia : S-S bond ) (Figure 3). In whole STAP2 dataset, 1,926 structures contain S-S bond (This information is obtained from 'SSBOND' part in PDB file) and the STAP2 performed refinement in consideration of S-S bond against these structures.

Figure 3. Consideration of S-S bond


Table 2. Comparison of structural quality between before and after consideration of S-S bond

Structure validation scores Before refinement After refinement('Flat')
Before consideration in S-S bond After consideration in S-S bond
Steric Clash score 42.08 ± 59.780.45 ± 0.990.41 ± 1
Three protein energy properties DOPE-5334.36 ± 4757.92-5557.07 ± 4679.55-5708.05 ± 4904.31
nDOPE-0.22 ± 1.11-0.51 ± 1.01-0.63 ± 1
dDFIRE-108.21 ± 93.78-116.3 ± 94.74-118.74 ± 98.79
Ramachandran plot appearance MolProbity79.18 ± 12.9794.2 ± 6.3294.3 ± 5.12
PROCHECK68.73 ± 15.3288.18 ± 8.588.25 ± 7.85
WHAT_CHECK
Structure Z-score Distribution*
1st packing quality-4.1 ± 1.97-4.04 ± 2.04-3.87 ± 1.97
2nd packing quality-2.99 ± 1.29-2.6 ± 1.48-2.35 ± 1.43
Ramachandran plot appearance-4.89 ± 1.860.78 ± 1.830.74 ± 1.83
Rotamer Normality-5.39 ± 2.140.63 ± 1.770.6 ± 1.79
Backbone Conformation -1.74 ± 1.31-1.58 ± 1.38-1.6 ± 1.41

*Bold font indicates the best scores.

As shown in Table 2, Although there are not significant difference, most of scores were improved after considering the S-S bond.




Statistics of structure validation scores : STAP2 refined structure with NOE distance restraint (3,360 structures)


Table 3. Statistics of structure validation scores of before and after refined structures ('Flat' and 'NOE')

List of 3,360 structures

Structure validation scores Before refinement After refinement ('Flat') After refinement ('NOE')
Steric Clash score 33.08 ± 49.650.30 ± 0.764.76 ± 30.72
Three protein energy properties DOPE-7934.45 ± 5099.76-8385.67 ± 5285.39-8380.67 ± 5307.24
nDOPE-0.53 ± 1.20-0.90 ± 1.08-0.90 ± 1.13
dDFIRE-160.86 ± 100.02-174.94 ± 106.19-174.98 ± 106.62
Ramachandran plot appearance MolProbity84.09 ± 10.9295.82 ± 4.0995.41 ± 5.56
PROCHECK76.24 ± 12.8691.35 ± 6.3390.92 ± 7.79
WHAT_CHECK
Structure Z-score Distribution*
1st packing quality-3.21 ± 1.94-2.79 ± 1.97-2.67 ± 1.99
2nd packing quality-2.63 ± 1.40-1.66 ± 1.69-1.53 ± 1.72
Ramachandran plot appearance-4.35 ± 1.991.71 ± 1.741.32 ± 2.05
Rotamer Normality-5.05 ± 2.511.44 ± 2.001.46 ± 2.22
Backbone Conformation -1.03 ± 1.42-0.97 ± 1.40-1.08 ± 1.48

*Bold font indicates the best scores.


Table 4. Three types of NOE restraint quality

Type of NOE Number of NOE Number of violated NOE
(0.0/0.5/1.0/2.0Å)
RMS NOE distance violations (Å) Max NOE distance (Å)
Before refinement After refinemt ('Flat') After refinement ('NOE') Before refinement After refinemt ('Flat') After refinement ('NOE') Before refinement After refinemt ('Flat') After refinement ('NOE')
All 1560 174/59/37/15 257/112/58/20 208/37/12/4 0.250 ± 0.559 0.371 ± 0.565 0.153 ± 0.300 2.759 ± 4.749 3.518 ± 4.678 1.644 ± 3.299
Intra 412 27/5/1/0 32/6/2/0 31/3/1/0 0.070 ± 0.153 0.094 ± 0.152 0.067 ± 0.123 0.638 ± 0.600 0.805 ± 0.569 0.572 ± 0.405
Sequential 390 35/9/4/1 55/17/7/1 49/7/1/0 0.130 ± 0.179 0.215 ± 0.177 0.113 ± 0.094 1.144 ± 1.052 1.646 ± 0.921 0.867 ± 0.479
Medium 286 40/13/8/2 62/29/14/3 48/7/1/0 0.230 ± 0.312 0.404 ± 0.309 0.140 ± 0.119 1.605 ± 1.659 2.330 ± 1.428 0.904 ± 0.762
Long 470 69/31/21/11 105/58/33/13 77/17/7/3 0.395 ± 1.190 0.569 ± 1.205 0.209 ± 0.650 2.425 ± 4.801 3.155 ± 4.769 1.404 ± 3.343

*Bold font indicates the best scores.


In Table 3, there are no substantial differences between the 'Dist and 'NOE'. Three types of NOE quality of 'NOE' was substantially improved compared to before refinement (Table 4).(Detailed explanation of three types of NOE quality is described in 'Method' page). Although the 'Flat' has a slightly increased RMSD NOE distance violations, it is included in range of NOE distance error that approximately 0.5-1.0 Å.
STAP performance was verified in our previous STAP paper, and the updated version shows a powerful refinement performance by generating better structure validation scores. Most of the NMR structures require distance restraints, such as NOE distance restraints, to refine their structures, and NOE distance restraints at times exhibit problems with data accuracy. However, because STAP2 includes distance restraints derived from their initial NMR structure, the NMR structures can have an improved structural quality without a NOE distance restraint and are similar to structures that are refined with experimental NOE data.




Performance of STAP2 in comparison with pre-refined structures (754 structures)


List of 754 structure

Among the 10,583 NMR structures, 754 structures were pre-refined by AMBER or CHARMM before deposited at PDB. We compared structure validation scores between pre-refined structures and re-refined structures that were refined by STAP2.


Table 5. Statistics of structure validation scores

Structure validation scores All years Last 5 years
Before refinement After refinement ('Flat') Before refinement After refinement ('Flat')
Steric Clash score 4.90 ± 22.510.22 ± 0.421.71 ± 3.410.31 ± 0.21
Three protein energy properties DOPE-7728 ± 5519-7886 ± 5477-7481 ± 5972-7693 ± 6020
nDOPE-0.76 ± 1.33-0.95 ± 1.25-0.52 ± 1.06-0.73 ± 1.01
dDFIRE-155 ± 106-164 ± 109-151 ± 116-160 ± 120
Ramachandran plot appearance MolProbity86.43 ± 11.0095.77 ± 4.2591.21 ± 5.9996.61 ± 2.63
PROCHECK78.53 ± 12.8591.19 ± 6.7482.46 ± 8.4592.72 ± 23.90
WHAT_CHECK
Structure Z-score Distribution*
1st packing quality-3.07 ± 2.38-2.59 ± 2.40-3.75 ± 2.16-3.05 ± 2.22
2nd packing quality-2.13 ± 1.76-1.48 ± 1.92-2.57 ± 1.58-1.87 ± 1.75
Ramachandran plot appearance-3.71 ± 1.831.84 ± 1.77-3.05 ± 1.671.93 ± 1.42
Rotamer Normality-4.15 ± 1.941.50 ± 1.82-3.57 ± 1.901.54 ± 1.57
Backbone Conformation -1.12 ± 1.42-0.99 ± 1.32-1.62 ± 1.37-1.21 ± 1.43

*Bold font indicates the best scores.



Figure 4. Radar chart of normalized structure validation scores


The radar chart indicates comparison of normalized scores of before and after refined structures and normalized structure validation scores of 18,347 high-resolution X-ray structures with under 2.0Å resolution and 11,056 NMR structures. The shaded region of the chart indicates greater quality than an average X-ray/NMR structure (upper 50% in structure quality). In X-ray space, clash, ramaW (WHAT_CHECK Ramachandran indicator), and rotamer score of STAP re-refined structures are located in shaded region whereas pre-refined structure are not in the shaded region (Figure 4). Moreover, in NMR space, although all structure validation scores of pre-refined structure are located in shaded region, all scores of STAP re-refined structure are located in the innermost area in the shaded region. Consequently, the structure which is already refined can more improve their structure quality through STAP2 refinement.




Performance of STAP2 in comparison with well-known NMR refinement approaches (88 structures)


List of 88 structures

We compared the performance of STAP2 with known refinement databases (DRESS and RECOORD) on 88 NMR structure. DRESS is a repository for solution NMR structures refined in explicit solvent and RECOORD is a database that contains recalculated structures from the PDB. The refined structures of two protocols were obtained their web pages to measure the structure validation scores.

References
"DRESS: a database of REfined solution NMR structures (Proteins, 2004)"
"RECOORD: A recalculated coordinate database of 500+ proteins from the PDB using restraints from the BioMagResBank (Proteins, 2005)"


Table 6. Average of structure validation scores

Structure validation scores Before refinement After refinement STAP2 ('Flat') DRESS 1RECOORD-CNS 2RECOORD-CNW 3RECOORD-CYA 4RECOORD-CYW
Steric Clash score 79.86 ± 71.890.46 ± 0.7910.05 ± 5.9817.63 ± 13.6715.67 ± 7.3159.38 ± 32.1116.32 ± 7.85
Three protein energy properties DOPE-6432.22 ± 5237.66-7187.95 ± 5589.36-7353.86 ± 5945.31-6623.01 ± 5131.48-7344.44 ± 5635.72-6355.9 ± 5148.38-7347.16 ± 5861.56
nDOPE-0.20 ± 1.15-0.92 ± 1.01-0.99 ± 1.11-0.42 ± 1.08-1.02 ± 1.10-0.16 ± 1.08-0.99 ± 1.09
dDFIRE-132.18 ± 103.26-150.75 ± 114.03-149.17 ± 116.32-132.67 ± 99.90-148.2 ± 110.22-127.51 ± 100.93-148.70 ± 114.79
Ramachandran plot appearance MolProbity73.76 ± 13.9394.13 ± 4.9385.06 ± 9.1179.61 ± 10.9185.77 ± 7.3167.25 ± 11.3483.16 ± 8.77
PROCHECK65.26 ± 16.2688.34 ± 7.0076.46 ± 12.8769.69 ± 15.0876.67 ± 12.2560.80 ± 15.1673.98 ± 14.30
WHAT_CHECK
Structure Z-score Distribution*
1st packing quality-3.46 ± 1.88-2.69 ± 1.70-2.17 ± 1.81-3.92 ± 1.99-2.11 ± 1.86-4.12 ± 1.98-2.31 ± 1.95
2nd packing quality-2.88 ± 1.21-1.58 ± 1.66-1.87 ± 1.18-3.05 ± 1.17-1.83 ± 1.25-3.13 ± 0.94-1.97 ± 1.22
Ramachandran plot appearance-5.82 ± 2.171.29 ± 1.81-4.20 ± 1.27-5.52 ± 1.23-4.30 ± 1.57-6.90 ± 1.31-4.31 ± 1.55
Rotamer Normality-6.31 ± 2.521.23 ± 2.45-2.71 ± 1.47-1.62 ± 2.00-2.67 ± 1.30-7.24 ± 0.67-2.96 ± 1.43
Backbone Conformation -1.65 ± 1.58-1.40 ± 1.49-1.52 ± 1.12-1.33 ± 1.29-1.49 ± 1.28-1.38 ± 1.34-1.70 ± 1.23

1CNS, models recalculated in CNS, 2CYA, models recalculated in CYANA, 3CNW, models recalculated in CNS and water-refined in CNS, 4CYW, models recalculated in CYANA and water-refined in CNS
*Bold font indicates the best scores.

DRESS and RECOORD water refinement (CNW and CYW) have a remarkable structural quality compared to the initial NMR structures.
Especially, they have a significantly improved WHAT_CHECK Z-score distributions. However, WHAT_CHECK z-score distributions of STAP2 refined structure were more improved than two approaches except the 1st packing quality, and the steric clash score also decreased.



Figure 5. Radar chart of normalized structure validation scores


*ramaM, ramaP, and ramaW represents MolProbity, PROCHECK, and WHAT_CHECK Ramachandran plot appearance, respectively.

The structure validation scores of the before/after refined with STAP and well-known refinement DB structures are compared. In X-ray space, three scores (ramaW, rotamer and clash) of STAP are only located at the shaded region. All structure validation scores of STAP are located at the shaded region in NMR space. RECOORD CNW and DRESS are also located at shaded region. But, RECOORD CYA is considerably dislocated from shaded region. Thus, it can be concluded that the STAP2 is comparable to the two well-known refinement approaches.




Performance of STAP2 in comparison with Rosetta refinement approaches (40 structures)


To compare structural quality of STAP2 and Rosetta, 40 NMR structures were used. These structures were used in 'Protein NMR structures refined with Rosetta have higher accuracy relative to corresponding X-ray crystal structures' paper and obtained from http://psvs-1_4-dev.nesg.org/results/rosetta_MR/rosettaMR_PSVS_summary.html.
The 40 NMR structures were refined by Rosetta in two ways (unrestrained and restrained Rosetta). We measured structure validation scores of structures that refined by STAP2 and Rosetta (Table 7). In STAP2, steric clash scores and Ramachandran scores were better than Rosetta refined structures.

Table 7. Average of structure validation scores

Structure validation scores Before refinement After refinement STAP2 ('Flat') Unrestrained Rosetta Restrained Rosetta
Steric Clash score 17.51 ± 5.430.34 ± 0.202.79 ± 1.903.84 ± 2.39
Three protein energy properties DOPE -10231 ± 3926-10412 ± 3847-10715 ± 4064-10196 ± 3931
nDOPE -0.92 ± 0.63-1.07 ± 0.61-1.35 ± 0.61-1.24 ± 0.61
dDFIRE -208 ± 75-218 ± 76-220 ± 79-212 ± 77
Ramachandran plot appearance MolProbity89.92 ± 5.3996.37 ± 2.4092.37 ± 4.3188.28 ± 7.68
PROCHECK82.76 ± 6.3692.42 ± 3.3783.45 ± 5.9081.25 ± 8.39
WHAT_CHECK
Structure Z-score Distribution*
1st packing quality-2.17 ± 1.16-2.14 ± 1.19-1.47 ± 0.98-1.40 ± 1.03
2nd packing quality-1.78 ± 1.38-0.97 ± 1.47-0.51 ± 1.23-0.50 ± 1.30
Ramachandran plot appearance-2.79 ± 1.402.44 ± 1.18-1.53 ± 1.11-0.50 ± 1.16
Rotamer Normality-2.12 ± 1.592.85 ± 1.286.04 ± 1.114.83 ± 1.33
Backbone Conformation -2.79 ± 1.40-0.59 ± 0.79-0.58 ± 0.73-0.48 ± 0.70



Performance of STAP2 in comparison with best 200 NMR structures


In whole STAP2 dataset, we selected 200 NMR structures which have high 'total score' and measured structure validation scores of these structures (Table 8). Although initial structures have better scores, all scores of refined structure were improved, especially WHAT_CHECK Ramachandran score and rotamer normality.


Table 8. Statistics of structure validation scores

Structure validation scores Before refinement After refinement ('Flat')
Steric Clash score 6.34 ± 6.990.06 ± 1.12
Three protein energy properties DOPE-13289 ± 82548254 ± 8491
nDOPE-1.74 ± 0.89-1.86 ± 0.81
dDFIRE-281 ± 159-291 ± 167
Ramachandran plot appearance MolProbity96.60 ± 2.5998.94 ± 1.05
PROCHECK92.50 ± 3.7296.24 ± 1.91
WHAT_CHECK
Structure Z-score Distribution*
1st packing quality0.36 ± 1.160.47 ± 1.13
2nd packing quality0.68 ± 1.541.49 ± 1.69
Ramachandran plot appearance0.49 ± 1.954.46 ± 1.15
Rotamer Normality0.69 ± 3.335.05 ± 1.96
Backbone Conformation 1.05 ± 1.180.43 ± 1.06