Beginning Biochemistry—Molecular Visualization

**Download the latest version of PyMOL for Windows and run the file to install PyMOL in preparation for class 2/21.**

The pinnacle of understanding biochemistry comes from molecular structures. Being able to locate an atom in a molecule at high resolution allows for greater certainty as to the role and function of that atom. While most of biochemistry involves indirect evidence (most biological molecules are too small to see), structure work give a picture of the minute.

From the historic perspective of technique, X-ray crystallography was the first method to reveal protein structure. The recognition that crystalline solids scatter X-rays was made by the father and son team of Bragg & Bragg (father William, son Lawrence) in 1912. The following year they had worked out the geometry to derive a structure of small molecules (NaCl) from a pattern of diffracted X-rays (they were awarded the Nobel Prize in Physics in 1915, the only father–son pair to share a Nobel Prize).

John Kendrew began thinking about solving a protein structure after the second world war. While X-ray diffraction as the technique was an obvious choice, the methods necessary would be developed over the next fifteen years. Kendrew published a structure of myoglobin (oxygen binding protein from muscle) in 1958, and shared a Nobel Prize in Chemistry in 1962 for the efforts.

From the 1960s until the mid–1980s, X-ray diffraction was the only game in town. However, during that time, Kurt Wühtrich was working on developing methods to use nuclear magnetic resonnance (NMR) to determine protein structures. The first structure solved by NMR was bull seminal proteinase inhibitor IIa published in 1985. And yes, Wühtrich shared the 2002 Nobel Prize in Chemistry for this work.

The newest technique applied to high–resolution structure determination in cryoelectron microscopy (cryoEM). While electron microscopy is old (circa 1931), and cryoEM has been used for large complexes (since 1968), the first structure solved at close to the resolution of X-ray diffraction was made in 2013. And yes, early developers of the methods, Dubochet, Frank and Henderson, shared a Nobel Prize in Chemistry in 2017 for their work.

Beginning to sound familiar, eh?

There is one more method to mention, and that involves prediction of protein structure. Since the alpha helix and beta strand were predicted by Robert Cory and Linus Pauling in 1951, there has been a strong interest in predicting a structure, given only a sequence. With a tremendous amount of work to solve the structures of many proteins, protein prediction advanced under the methods of deep learning, a set of artificial intelligence methodologies. AlphaFold won the 2018 Critical Assesment of Strucutre Prediction event, and did even better at the 2020 contest. For proteins without a solved structure, there is often a predicted structure available to help in understanding the function.

**Next two paragraphs written last year.

But, alas, no Nobel Prize, yet—currently there is a nearly ten year lag between discovery and award to make sure fundamental discoveries are truly revolutionary.

**Back to the current narrative.

The 2024 Nobel Prize in Chemistry was awarded to Baker, Hassabis and Jumper for their work in computational protein design (Baker) and structure prediction (Hassabis and Jumper). Called it!

The central premise of scientific discovery is the free sharing of results. Any data you collect is shared with everyone so that they are not required to redo work and can base their next experiment where you left off in your work. Newton is often quoted at this juncture.

The shoulders on which structural biochemistry stands were formed in the early 1970s when the Protein DataBank was founded in Brookhaven, NY. The Protein DataBank (PDB) is the singular repository for all structural data (and don't worry, while it's called the Protein DataBank, it also houses nucleic acid, lipid, carbohydrate and even virus structures).

The central identifier of every structure in the PDB is a four character accession code. The earliest pattern was a number followed by three letters where the number was the number of the structure and the three letters were a shortened version of the protein name. For example, 7RSA is the seventh structure of ribonuclease A to be solved. Since there are (as of mid February, 2025) 231,356 experimental structures in the PDB, the pattern has given way to four characters assigned in sequence from available accession codes without regard to the molecule in question. Gone are the days of vanity codes.

In addition to keeping all the records and making structures available at all times, the PDB has a strong educational arm. Visit PDB-101 to access the learning resources.

So, a wealth of data, right?

Well, one small problem. The PDB coordinate file format was specified in the early 1970s to exchange structural data. Translated, it's plain text.

So, that adage about a picture and a thousand words coming to mind?

Well, that's where molecular visualization comes into play. If you read the first line above, the real numbers that start just shy of the middle of each line, 15.695, 17.621 and 7.193, those are the positions of this atom in three dimensional space (in units of 1 x 10-10 m). And therein lies the beauty, rendering a molecular structure is, more or less, just like drawing a video game frame, something computers can do well. Molecular visualization software is then, in essence, a three–dimensional video game, without the game loop.

For our visualization software, we'll use PyMOL which is currently the most popular of the molecular visualization software. While PyMOL is an open source project, a complete distribution is maintained and sold by Schrödinger, a molecular and materials software company.

Our first challenge is to download, install and license PyMOL. Download the latest version of PyMOL for Windows and run the file to install PyMOL. The mechanics will vary a bit, depending on your browser choice, but it should be familiar. After PyMOL is installed, from the Start menu run "PyMOL + Console." The first dialog will ask for a license file. Save the license file, emailed to you seperately, to a known location and then point PyMOL to that file. Our license is good for the next six months (I can get you another in the fall, just ask).

There are a number of regions in the main window. Starting at nine o'clock is a large black box. That's the dark void in which molecules will appear. Above that is a small console for typing commands (there is a larger one open as a second window). Down the right side is the object browser. As molecules are loaded and subsets selected, they will appear as objects there that can be drawn (or hidden) and styled. These are the main parts of the interface. There are some other bits around that will become useful later.

Molecules are loaded from the file menu or with the command:

Voila! A molecule has appeared in the dark void.

Note that a molecule object has appeared on the right in the object browser. Click on the object in the browser will toggle visibility. Many attributes of how the object is drawn can be controlled with menu generated from the letters to the right of the object (color, C, is the obvious one).

In the display window, left click and drag to rotate the molecule, right click and drag to translate and center click for depth. Clicking on any part of the molecule will identify chain, residue and atom. Look up to the console for the output.

Often you'll need to select a portion of the molecule to highlight with a different drawing style. Selection commands work on the verb name, selection syntax. For example, to select residues 58 and 92 in RNase T1 (9RNT):

Both residues are highlighted in the display as well as in the object browser you'll note a new object named "active."

The command syntax is vast, but there are a many resources cataloging commands and syntax. A few of those are listed below:

The field of protein design often involves designing small portions of proteins that mimic natural structures. The maquette is styled after a four helix bundle. While normally part of a larger protein, this isolated structure is a clear start. Produce an illustration of 1M3W which clearly illustrates the presence of all four helicies (rotation), each a different color (selection, color) in rainbow order (ROYGBIV, your choice of color subset) by chain designation.
The interaction of enzyme and substrate is one of the fundamental of the universe unveiled in structural studies. Bovine pancreatic ribonuclease (RNase A) is a small (124 amino acid) digestive enzyme which breaks large RNAs into smaller RNAs. The active site is composed of a pair of histidine residues, 12 and 119, involved in the catalysis. The structure 1RNM includes both RNase A and a bound nucleotide monophosphate (CMP). Prepare a diagram of 1RNM including the following features:
- A medium gray cartoon representation of the secondary structure
- Remove water molecules
- Remove sulfate anions
- Draw CMP as ball and stick, colored by atom type
- Draw HIS12 and HIS119 as ball and stick, colored by atom type
- Rotate the molecule to provide a clear view of all three highlighted portions
While interactions with substrates imply catalysis, some molecules just come together, we call it binding. The two components of a binding interaction are the protein (obvious) and the ligand (whatever is bound). Often the ligand is a small molecule such as a metabolite. However, sometimes the ligand is much larger than the protein. That is evident in the structure 1J1V where a portion of Dna A (an E. coli protein involved in the initiation of DNA replication) is bound to a small DNA sequence (DNA is often called the "noble ligand" as a large molecule bound by proteins). Prepare an image of thsi complex in which:
- The two DNA chains are a shade of grey, draw the molecular surface
- Draw the protein cartoon in a unsaturated color ("tints" in PyMOL) and add the molecular surface.
- Rotate the structure such that 433 end of the 433–449 helix is toward the view point with the DNA above, showing the precise fit of an α–helix in the major groove of DNA.
About how many base pairs have contact with the 433–449 helix?
As Picasso suggests, it's not plagiarism but inspiration which great artists take from other works.
"Good artists copy; great artists steal."

-Pablo Picasso
Now that you are a budding molecular artist, maybe it is time to draw inspiration from other molecular artists. Search the journal Biochemistry for articles referencing PyMOL, select one article which strikes your fancy and select a figure from that paper generated with PyMOL. Using that figure as inspiration, create a figure which is similar. You may try to recreate the figure to pixel–by–pixel perfection, take the molecule illustrated and put your own spin on what the authors are trying to show, or you may take their technique and apply it to a structure of your own choosing. Present your inspiration figure, with paper reference, as well as your masterpiece. What did you learn about style, technique or structure?
Student's choice. Browse the Protein DataBank and choose a structure or complex. Prepare one still image highlighting a feature of your choice. Annotate that image with one paragraph about structure, function or biological role. A very good place to start is by scrolling down on the main PDB page for that structure. The literature (try the PubMed link) and macromolecules sections will provide more information than you can use.