Tutorial¶
The nmrpeaklists library can be used to read, write and edit NMR peak list files. It can also be used to convert between different peak list file formats. Currently, the library supports peak lists from NMRPipe and Sparky, and it can read peak lists in the XEASY and UPL formats.
The library was originally written to support data work flows involving CARA, CYANA and nlinLS. The scripts provided with the library reflect this fact. However, it can be used equally well to support any data flow involving the supported peak list formats.
In the following tutorial, the code blocks assume that the nmrpeaklists library has been imported with the following command.
>>> import nmrpeaklists as npl
Data structures¶
The nmrpeaklists library provides three data structures that aim to mimic the structure of a peak list. A PeakList object represents an entire peak list and is composed of a sequence of Peak objects. Each Peak object represents a single line in a peak list and is composed of a sequence of Spin objects.
Spin¶
A Spin object is a container that holds attributes of a particular NMR spin resonance. A peak in an N-dimensional NMR spectrum is associated with N different spin resonances. Spin objects aggregate attributes of the associated spin resonances, like chemical shift, assignment, line width, etc. The following three attributes are pre-defined for each Spin object:
| Attribute | Meaning |
|---|---|
| res_type | Residue type |
| res_num | Residue number or spin system number |
| atom | Atom name |
Other attributes may be added to Spin objects as necessary. The following table provides a list of suggested attribute names for some common spin parameters:
| Attribute | Meaning |
|---|---|
| shift | Chemical shift |
| shift_pts | Chemical shift in number of points (spectrum specific) |
| width | Line width of the peak in the corresponding dimension |
Spin attributes can be set upon initializing the Spin or added to the Spin after creation.
>>> spin = npl.Spin(atom='N')
>>> print(spin)
Spin(atom='N')
>>> spin.res_num = 15
>>> print(spin)
Spin(res_num=15, atom='N')
Peak¶
A Peak object represents a single line in a peak list. It can be treated as a mutable sequence of Spin objects and can be used like any normal Python list. To initialize a Peak object, use the keyword argument spins to provide a list of Spin objects.
>>> spins = [npl.Spin(atom=atom) for atom in ('H', 'N', 'HA', 'CA')]
>>> peak = npl.Peak(spins=spins)
>>> [spin.atom for spin in peak]
['H', 'N', 'HA', 'CA']
>>> del peak[1:3]
>>> [spin.atom for spin in peak]
['H', 'CA']
>>> peak.append(npl.Spin(atom='HB'))
>>> [spin.atom for spin in peak]
['H', 'CA', 'HB']
Each Peak object may have additional attributes that relate to the peak as a whole. For example, Peak objects created from XEASY files usually have a volume attribute, whereas Peak objects created from UPL files have a distance attribute. Furthermore, arbitrary attributes may be added to each Peak as needed. For example, when processing CEST data, users may want to add a CEST_profile attribute to each peak. Additional attributes can be added as keyword arguments at initialization or as attributes after the Peak has been created.
>>> peak = npl.Peak(volume=50000)
>>> peak.CEST_profile = [0.4]*10
>>> peak.volume
50000
>>> peak.CEST_profile
[0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4]
Note
Only the NMRPipe and Sparky formats are flexible enough to write any arbitrary Peak attribute to a file. See `Column Templates`_ for more information.
PeakList¶
PeakList objects are mutable sequences of Peak objects and can be used as if they were Python lists.
>>> len(peaklist)
12
>>> for peak in peaklist:
... peak.volume = 50000
...
>>> peak = peaklist[8]
>>> peak.volume
50000
Just like Peak objects, PeakList objects also have attributes representing properties belonging to the peak list as a whole. The following attributes are pre-defined and are calculated by each PeakList object:
dims¶
The dims attribute is an integer specifying the number of dimensions in the peak list. It corresponds to the number of Spin objects in each Peak object of the PeakList. An AttributeError is raised if the PeakList is empty or if any two Peak objects in the PeakList contain a different number of Spin objects. The following should always be true for a non-empty PeakList:
>>> peak = peaklist[0]
>>> peaklist.dims == len(peak)
True
anchors¶
The anchors attribute specifies which dimensions of the peak list correspond to spin anchors. A spin anchor is a directly attached proton/heavy atom pair. Each spin anchor is represented by a tuple of two integers, where the integers are indices into Peak objects to extract the two corresponding Spin objects that form the spin anchor. The index of the proton spin always comes first. The anchors attribute of a PeakList is a list of tuples indicating the spin anchors. Only one anchor is possible in 2D and 3D peak lists, but two anchors are possible in 4D peak lists.
>>> peaklist.anchors
[(2, 0)]
>>> anchor = peaklist.anchors[0]
>>> peak = peaklist[0]
>>> anchored_spins = [peak[i] for i in anchor]
>>> [(spin.res_num, spin.atom) for spin in anchored_spins]
[(41, 'HD1'), (41, 'CD1')]
Reading peak lists¶
For each supported peak list file format, there exists a corresponding subclass of the PeakListFile class.
| File format | Subclass |
|---|---|
| NMRPipe | PipeFile |
| Sparky | SparkyFile |
| XEASY | XeasyFile |
| UPL | UplFile |
To read a peak list file, create an object of the appropriate subclass and run its read_peaklist method. These two steps can be performed in one line.
>>> peaklist = npl.XeasyFile().read_peaklist(filename)
Alternatively, if you need to modify the object before reading the peak list, you can perform the two actions separately.
>>> peaklistfile = npl.XeasyFile()
>>> peaklist = peaklistfile.read_peaklist(filename)
Objects of PeakListFile subclasses are usually only edited when the corresponding file type is customizable, i.e. columns in the peak list can be added, removed and rearranged. This is accomplished by modifying the object’s ColumnTemplate. For more information, see `Column Templates`_.
XEASY format¶
The XEASY peak list format provides a set of spin IDs for each peak, but it does not include any assignment data directly. Assignment data can only be obtained by referencing each spin ID against a mapping of spin IDs to their respective assignments. Consequently, users must perform an extra step when reading XEASY peak lists in order to incorporate assignment data.
The mapping between spin IDs and spin assignments can be provided in several different forms. The standard approach uses a sequence file and atom list to create the mapping. Alternatively, if the spin IDs relate to a CARA repository, then users can create the mapping as a single file using the Lua script provided with this library. Finally, if the XEASY peak list is an anchor peak list from CARA, then the peak list itself contains the assignment data as comments interleaved within the data.
The AssignmentFile class defines an interface for this mapping. Subclasses of AssignmentFile are specific to each file or set of files used to create the mapping. The data is read from the file(s) using the AssignmentFile method read_file.
| File(s) | Subclass |
|---|---|
| Atom list & Seq file | AtomListSeqFile |
| CARA spin ID file | CaraSpinsFile |
| CARA anchor peak list | CaraAnchorFile |
Objects of AssignmentFile subclasses can be used directly, as if they were a dictionary mapping spin ID values to Assignment tuples. When the XEASY peak list is read, each Spin in the PeakList is given a spin_id attribute. The AssignmentFile method assign_peaklist takes a PeakList and sets the res_type, res_num and atom attributes for each Spin based on its spin_id attribute.
>>> peaklist = npl.XeasyFile().read_peaklist(peaklist_filename)
>>> spin = peaklist[0][0]
>>> (spin.res_type, spin.res_num, spin.atom)
(None, None, None)
>>> spin.spin_id
194
>>> assignments = npl.CaraSpinsFile().read_file(spins_filename)
>>> assignments[194]
Assignment(res_type='H', res_num=43, atom='HA')
>>> peaklist = assignments.assign_peaklist(peaklist)
>>> (spin.res_type, spin.res_num, spin.atom)
('H', 43, 'HA')
Modifying a peak list¶
Sorting¶
To facilitate sorting a PeakList by its Peak assignments, Spin objects may be compared to each other with the comparison operators (<, <=, > and >=). These comparisons are only influenced by the assignment data, not by any other attributes of the spins. The default sorting order is by residue number, then sidechain position and finally atom name. Unassigned spins are always sorted last.
>>> spin1 = pl.Spin(res_num=24, atom='HB')
>>> spin2 = pl.Spin(res_num=24, atom='HD1')
>>> spin1 < spin2
True
Peak objects may also be compared using the comparison operators, and once again, only the assignment data influences sorting. The default behavior for peaks sorts them as tuples of their respective spins. As a result, the order of the peak list dimensions (i.e. the order of spins in each peak) matters greatly when sorting peaks. This is especially evident in NOESY peak lists.
>>> spin1 = pl.Spin(res_num=28, atom='HG')
>>> spin2 = pl.Spin(res_num=17, atom='HA')
>>> peak1 = pl.Peak([spin1, spin2])
>>> spin3 = pl.Spin(res_num=14, atom='HG2')
>>> spin4 = pl.Spin(res_num=63, atom='H')
>>> peak2 = pl.Peak([spin3, spin4])
>>> peak1 < peak2
False
>>> print sorted([peak1, peak2])
[Peak(spins=
[Spin(res_num=14, atom='HG2'),
Spin(res_num=63, atom='H')]), Peak(spins=
[Spin(res_num=28, atom='HG'),
Spin(res_num=17, atom='HA')])]
Use the sort_by_assignment PeakList method to change the default sort order. By default, sort_by_assignment takes into account the spin anchors, but it still gives the highest sorting priority to the lowest-index Spins. Alternatively, you can manually specify the sort order using the order keyword argument.
>>> peaklist.sort_by_assignments(order=[1,0])
>>> print peaklist
PeakList(peaks=
[Peak(spins=
[Spin(res_num=28, atom='HG'),
Spin(res_num=17, atom='HA')]),
Peak(spins=
[Spin(res_num=14, atom='HG2'),
Spin(res_num=63, atom='H')])])