Manual

Input formats

There are 3 input formats that TMCrys can process. Examples for all can be found on the Submit page. The sequences can also be uploaded as a text file. The formats are:

  • Simple FASTA format either single or multiline.
  • FASTA format with the topology of the protein. In this case, CCTOP will not be run on the input sequences. The sequence and the topology should be single lined, thus an entry consistst of 3 lines:
    • >header
    • sequence
    • topology
  • Space separated format: the id, sequence and topology of the protein, each separated by a single space. The provided topology will be used for the prediction.

Topology should be the same length as the sequence and give the position of every amino acid of the transmembrane protein. The following letters represents the different locations:

  • I
    =>
    Inside
  • O
    =>
    Outside
  • M
    =>
    Membrane
  • S
    =>
    Signal
  • L
    =>
    Re-entrant loop

Running time

The running time of TMCrys depends on the server load and the running time of CCTOP and NetSurfp. The latter two softwares' running times increase with the length of the sequence, a longer sequence can take minutes to be processed.

Sequence batches are limited to 10 sequences at a time, depending on the length of its sequences this usually finishes in a few minutes. You can follow the progress on the waiting page by taking a look at the progress bar.

HTML output

The output of TMCrys is diplayed on the result page in HTML format. An example output can be found here.

Figure 1. Example of output.
CCTOP predicts whether the input sequence is transmembrane protein. If not, a yellow coloured panel with the text 'nonTMP' in the right corner indicates that information. The panels belonging to transmembrane proteins are either coloured green (when the whole process is predicted to be successful) or red (in case of failure).

For every transmembrane protein query sequence the following pieces of information are displayed:

  • Solubilization/Purification/Crystallization/Whole process: the predicted values of the different crystallization steps as numbers and as a slider diagram. The values are in range [0, 1].

    The slider diagram is as follows:

    Min value (failure)
    Threshold
    Max value (success)
    0
    1
    Predicted value
    Figure 2. Example of the slider diagram for the visualization of the prediction.
    The threshold used for classification is indicated as a yellow stripe between the two sides of the diagram. Different thresholds were set for the different steps. The value of the actual classification is indicated by a blue vertical line.
  • Sequence: the supplied sequence of the protein.
  • Topology: the predicted or supplied topology of the protein.
  • Similar proteins in TSTMP
    • 3D: transmembrane proteins whose structures are known
    • Modelable: transmembrane proteins that could be modeled with existing structures
    • Target: transmembrane proteins whose structures are unknown and that do not have structures to modeled with
  • Similar proteins in TargetTrack: TargetTrack database incorporated structural genomics experiments with detailed description of methods. Finding similar proteins might help with the process of protein structure determination. Similar proteins are found by running Blast on TargetTrack entries.
  • Features: some of the calculated features used for prediction.

Downloadable output formats

The output of TMCrys can be downloaded in multiple formats.

  • XML: the schema describing TMCrys output can be downloaded in XSD format.
  • Tab separated file: this file consists of 13 columns namely
    Name
    the name of the query provided by the user in the FASTA header
    SolValue
    predicted probability for success of solubilization
    SolDecision
    decision on the basis of the solubilization value using the defined threshold
    PurValue
    predicted probability for success of purification
    PurDecision
    decision on the basis of the purification value using the defined threshold
    CrysValue
    predicted probability for success of crystallization
    CrysDecision
    decision on the basis of the crystallization value using the defined threshold
    WholeValue
    predicted probability for success of the whole process from solubilization to crystallization
    WholeDecision
    decision on the basis of the whole process value using the defined threshold
    Realiability
    reliability of the prediction for the whole process
    Sequence
    the sequence of the protein that was provided by the user
    Topology
    the topology of the transmembrane protein calculated by CCTOP or provided by the user
    TSTMP_3D
    similar proteins in TSTMP database whose structure has been determined (separated by semicolons)
    TSTMP_Model
    similar proteins in TSTMP database that would be able to be modeled from the structure of the query (separated by semicolons)
    TSTMP_Target
    similar proteins in TSTMP database that has no known structure and cannot be modeled from existing structures (separated by semicolons)
    TargetTrack
    similar proteins in TargetTrack -experiment IDs with similar protein sequences that might help in the structure determination proces (separated by semicolons).
    pI
    isoelectric point of the protein (ProtParam)
    buriedratio
    ratio of the number of buried and exposed amino acids (NetSurfP)
    avgRSA
    average solvent accessible surface area averaged across the residues of the protein (NetSurfP)
    length
    Length of sequence
    gravy
    GRAVY (ProtParam)
    lgmw
    Logarithm of the molecular weight of the protein
    inindex
    Intsability index (ProtParam)
    half
    Half life in mammals (ProtParam)