NAMD LAB 3: LIGAND UMBRELLA SAMPLING IN A PROTEIN-BILAYER SYSTEM

by Mitch Gleed

WORK IN PROGRESS


Objectives:


To complete this lab, you must have completed the previous lab.

The purpose of this lab is to provide an introduction to umbrella sampling using NAMD. You will use the system you created and equilibrated from CHARMM-GUI in the previous lab and perform umbrella sampling of a drug of choice in that system.

See this CHARMM lab for a background and discussion of umbrella sampling and the potential of mean force.

1. Reaction coordinate

From the previous lab, we have an equilibrated system of M2 protein embedded in a DMPC lipid bilayer, surrounded by 0.15 M NaCl in water. We investigated what happens when you insert a drug into the protein channel and let the system simulate for a period of time. We plotted the change in protein backbone structure (RMSD) over time and the drug adamantane cage position (Ang) along the channel axis over time. Instead of changes over time, we will now investigate intrinsic properties of the M2 channel with respect to a drug.

Among the many possible reaction coordinates to investigate, here are two examples:

Our lab investigated the free-energy profile of amantadine in 2KQT (wild-type Amt-bound M2) using two-dimensional umbrella sampling in which both reaction coordinates were investigated. (What orientation does the drug adopt most freely at different positions along the channel axis, and what positions along the channel axis does the drug have the lowest free energy?) You can read more about that study here (see Fig. 4).

This lab will investigate the free-energy profile of amantadine at different axial distances within the 2L0J M2 protein channel (2nd example above), though you may adapt the protocol to investigate other drugs, other M2 channels, or other reaction coordinates.

2. Umbrella sampling workflow

Let’s go over the general approach of how we will accomplish umbrella sampling in this lab using our files from the previous lab.

There are hundreds of different ways that the umbrella sampling protocol can be performed. If you are experienced and more comfortable with Python, C++, etc., you could technically do most of the template replacement, submission, and data aggregation presented here. For the purposes of the lab we will use bash for our code, the default Linux interpreter. You do not have to follow the methods described here if you are comfortable with coding and prefer other methods.

Here is a chart demonstrating the suggested general workflow: alt text

One approach to performing umbrella sampling with this workflow is using a “master” submission script. The submission script contains all of the variables of interest and defines the reaction coordiante of interest. Its function is to loop over all of the simulation configuration files (CHARMM, VMD, NAMD, etc.; the files in the middle column of Figure 1) providing environment variables specific for each window along the reaction coordinate. The submission script carefully manages filenames so the output data can be intelligently extracted for analysis in WHAM.

An analysis script can be run after the simulations complete that extracts the meaningful output data and execute WHAM.

3. Design the protocol

That’s pretty much it! We have provided a couple skeleton scripts containing some suggested pseudocode to help guide you, but now that you’ve made it this far in the course you should be able to perform most of this on your own.

ASSIGNMENT: Perform umbrella sampling on a reaction coordinate of your choice with a drug of your choice in the equilibrated M2 CHARMM-GUI system from the previous lab.

The two scripts we have provided you (which you are not required to use, but would probably be helpful) are listed below. They each contain some basic comments and pseudocode to help get you started:

You will need the following files from the previous lab in your current lab directory. You may rename any of them as you prefer:

To save some time, below is some code you can execute within the current lab’s directory to copy these files from the previous lab, if you keep each of the lab directories within a directory together. Otherwise just modify the code to suit your directory structure.

cp ../namdlab2_m2amt/charmm.adddrug.str .
cp ../namdlab2_m2amt/colvars.production.inp .
cp ../namdlab2_m2amt/namd.production.inp . 
cp ../namdlab2_m2amt/vmd.compsets.inp . 
cp ../namdlab2_m2amt/bash.pt2_adddrug.sh .
cp ../namdlab2_m2amt/bash.pt3_compsets.sh .
cp ../namdlab2_m2amt/bash.pt4_simulation.sh .
cp -r ../namdlab2_m2amt/alm .
cp -r ../namdlab2_m2amt/avg .
cp -r ../namdlab2_m2amt/charmm-gui .

One tip to start, make sure to use variables anywhere you have filenames within your configuration files (CHECK ALL OUTPUT FILE NAMES!). You do not want to perform your whole umbrella sampling protocol to find out all of the output files have the same name and overwrite each other! For good-practice’s sake, you might also choose to use variables for any parameter that may potentially be of interest as some point, such as temperature, number of simulation steps, etc. If you were already using environment variables in the last lab, you will be a step ahead.

You will need a collective variables configuration file that adds a harmonic constraint to the collective variable you are studying. Using the channel axis and drug position as an example reaction coordinate, each simulated umbrella window will have a harmonic restraint on the drug cage carbons at a specific point along the channel axis.

Refer to the next section of the lab for helpful bash tips and guidance.

4. Bash review and other helpful hints

Variables

To set a variable in bash, just use the = sign. To create a variable named channel equal to 2l0j, just do channel=2l0j. To refer to that variable in the script, add a $ in front of the variable name.

One headache-saving tip is to use variables when defining other variables. Here’s an example of using multiple variables to define a pattern.

pdbid=2l0j
runCount=1
inputFilePattern=${pdbid}_${runCount}

You can then call $inputFilePattern within a script to load $inputFilePattern.psf, $inputFilePattern.crd, etc.

Environment variables

In bash, variables stay within the environment of the bash script being run unless the export function is used. Let’s review how to use environment variables in each program.

CHARMM

  1. Use export variableName=variableValue in the bash script.
  2. Add variableName:$variableName to the CHARMM run command. (see the end of bash.pt2_adddrug.sh). Alternatively you can skip 1. and do variableName:variableValue, but other programs will not have access to this variable. You need this for every variable you wish CHARMM to use from the environment!
  3. In the CHARMM script being run, you can simply refer to the variable with @variableName.

NAMD/VMD

  1. Use export variableName=variableValue in the bash script.
  2. In the NAMD/VMD script being run, use $::env(variableName) to refer to the environment variable.

Colvars does not use environment variables. You will need to somehow tell colvars to use these variables, though. One option is to use a colvars “template” file, put unique words in the place of the values you would like to substitute, copy the template within the submission loop with cp, and then perform a find/replace with sed as described below.

Find / Replace with sed

There are multiple ways to do Find / Replace in Linux. One utility capable of find/replace is sed.

The following command will replace all occurences of the string YES in foo.txt with the string NO in a new file called bar.txt:

sed "s,YES,NO," foo.txt > bar.txt

The following command will replace all occurences of the string DRUGPOSITION with the value of the variable drugposition and save changes to the same file:

sed -i "s,DRUGPOSITION,${drugposition}," output/colvars.${outputPattern}.inp

Advanced substitutions involving punctuation, special characters, ignoring case, etc. are possible but a bit more complex. See the sed documentation for more information.

Arithmetic

Arithmetic is a significant limitation of bash, so other utilities are used to perform the math.

Here are two different ways to assign the sum of the exported variables reactionCoordinateStart and reactionCoordinateIncrement to the variable umbrellaWindow:

export umbrellaWindow=`perl -E 'say ($ENV{reactionCoordinateStart}+$ENV{reactionCoordinateIncrement})'`
export umbrellaWindow=`echo "( $reactionCoordinateStart + $reactionCoordinateIncrement )" | bc`

perl is better for more advanced calculations and the echo ... | bc method is a bit easier for very basic calculations.

Here’s another example of using perl to compute the arccosine of the exported bash variable cos, converting to degrees, and assigning it to the bash variable umbrellaTiltDegrees:

export umbrellaTiltDegrees=`perl -E 'use Math::Trig; say acos($ENV{cos})*180/pi'`

Backticks

In the above two examples with arithmetic, you’ll notice the use of backticks ` ` ` within the code. In bash, the backticks are typically used to evaluate an expression and set it to a variable.

For example, type in date on the command line and see what happens. After that, try out the following code:

currentDate=`date` 
echo $date

The value of the variable currentDate is set to the output of the command date.

Consider the perl arithmetic example from the section above:

export umbrellaWindow=`perl -E 'say ($ENV{reactionCoordinateStart}+$ENV{reactionCoordinateIncrement})'`

The portion of the command within the backticks is evaluated using the utility perl, and the output is set to the variable umbrellaWindow.

An alterative to backticks that some consider superior is to encapsulate a command with $ and parentheses, as in $(command). See this guide for more information.

Pipes

Pipes (|) are used in bash to pass the output of one command to another utility.

For example, if you do head -n 3 sample.txt it will print out the top three lines of the file sample.txt to the standard output, the command line. If you do the command tail -n 3 sample.txt, it will print out the bottom three lines.

You can use pipes to combine the two commands. Say you want to print out the top three lines of a file, get the last line of the three you filtered, and write to a file named edit.txt you could do:

head -n 3 sample.txt | tail -n 3 > edit.txt 

Learn more here.

Number formatting

For analysis purposes and automation, managing your decimal places and floating zeros is important. For ease in the analysis stage, it may be helpful to you to be consistent with number length and decimal places.

printf is a common cross-platform utility for formatting a variety of data types. To redefine the variable umbrellaWindow with a current value of 2.5 to a new value of 002.500:

umbrellaWindow=`printf "%07.3f" $umbrellaWindow`

Breaking down the formatting argument, %07.3f:

Here is a helpful resource for learning more about formatting with printf.

Echo

The echo command writes its arguments to the standard output. It is an essential tool for a variety of reasons. Here are a few:

Here is an example of using echo to write the value of a variable umbrellaWindow to a new file window.txt:

echo $umbrellaWindow > window.txt 

As in the above example, if the file window.txt already exists, it will be overwritten. If you desire to append to the file instead, use two angular quote brackets >>:

echo $umbrellaWindow >> window.txt 

Find more information here.

Loops

There are multiple types of loops and multiple ways of performing each loop in bash.

For Loop. A for loop is used to execute a list of commands and iterates through a series.

Let’s use an example of iterating through umbrella windows with a for loop.

export reactionCoordinateStart=3
export reactionCoordinateIncrement=0.5
export reactionCoordinateStop=6
# calculate number of umbrella windows 
totalWindows=`perl -E 'say 1+($ENV{reactionCoordinateStop}-$ENV{reactionCoordinateStart})/$ENV{reactionCoordinateIncrement}'`

for (( windowCount=0; windowCount<$totalWindows; ++windowCount )); do 
	export windowCount=$windowCount
	umbrellaWindow=`perl -E 'say ($ENV{windowCount}*$ENV{reactionCoordinateIncrement}+$ENV{reactionCoordinateStart})'`
	echo "current umbrella window is $umbrellaWindow" # check the calculation and the loop counter 
done	

The output of above snippet is:

current umbrella window is 3
current umbrella window is 3.5
current umbrella window is 4
current umbrella window is 4.5
current umbrella window is 5
current umbrella window is 5.5
current umbrella window is 6

Here is a description of the for statement in the above example:

  1. windowCount=0: Start with a value of 0 for windowCount.
  2. windowCount<$totalWindows: Continue running the loop until windowCount is no longer less than totalWindows
  3. ++windowCount: Increment windowCount by 1 for every loop iteration

windowCount is used within the loop as a multiplication factor for the variable reactionCoordinateIncrement; their product is added to the variable reactionCoordinateStart to give the value of umbrellaWindow. In the case of a distance reaction coordinate, umbrellaWindow would represent the desired location of the drug along the channel axis to be used for that particular simulation.

While Loop. A while loop is used to execute a list of commands as long as a condition is met. These are used less frequently than for loops, but they have utility. Let’s recreate the same outcome of the previous for loop using a while loop:

windowCount=0
while (( windowCount < $totalWindows )); do
	export windowCount=$windowCount
	umbrellaWindow=`perl -E 'say ($ENV{windowCount}*$ENV{reactionCoordinateIncrement}+$ENV{reactionCoordinateStart})'`
	echo "current umbrella window is $umbrellaWindow" # check the calculation and the loop counter 
	((windowCount += 1))
done

The while loop above performs the code until the condition windowCount < $totalWindows is false. The line ((windowCount += 1)) increments windowCount within the loop, similar to how the for loop increments it with ++windowCount within the for statement.

Other uses for while loops uses include performing code until a certain time of day, until a scheduled job starts, etc. Often the command sleep is used in conjunction with a while loop so the code isn’t performed constantly.

Here is a helpful resource for learning more about loops.

If statements

The if command evaluates a conditional statement and performs code depending on the status of the condition.

For example, if you want to see if the output file already exists before running a simulation, you could do something like this:

if [ -f output/${outputPattern}.log ]; then
	echo "file exists already, exiting script"
	exit 0
else 
	echo "file does not exist"
fi 

The -f argument within the conditional statement (in brackets) denotes a boolean for a file that returns true or false. In a sense, the conditional statement is reduced, to true or false (so if the file exists, it would be like if true; then ...

Here is an if statement that compares two variables:

if [ $windowCount < $totalWindows ]; then 
	echo "the current umbrella window count is less than the total number of umbrella windows"
fi 

Here is an if statement that is used to interpret input from the user from the command line:

echo "The total number of umbrella windows to be simulated is $totalWindows, do you wish to continue? (y/n)"
read proceed # waits for user input 
if [[ $proceed == Y || $proceed == y || $proceed == Yes || $proceed == yes ]]; then 
	echo -e "Proceeding with submission..." 
	sleep 1
else
	echo -e "Terminating script"
	sleep 1
	exit 0
fi

Notice the use of the ||, which are “OR” in boolean logic. && is the “AND” operator.

See this site if you’d like to learn more.

Arrays

An array is a variable that contains multiple values.

When setting up a reaction coordinate, for example, you could set variables named reactionCoordinateStart, reactionCoordinateIncrement, and reactionCoordinateStop and use arithmetic to pass the umbrella window position to the simulation program. An alternative would be to use a variable array, with umbrella window positions manually entered in.

reactionCoordinate=( "-5" "-4" "-3" "-2" )

To get the value of the 3rd value in the array, -3, use ${reactionCoordinate[2]}. You use 2 in the brackets because in the Linux shell, the first array position is defined at position zero, so the third value in the array would be at position two.

To loop through all the elements of the reactionCoordinate array:

for (( count=0; count<${#reactionCoordinate[@]}; count++ )); do 
	echo ${reactionCoordinate[$count]}
done

The output of the above snippet is:

-5
-4
-3
-2

For more information, visit this site.

Functions

Functions are reusable snippets of code that can be called at different points within a script. We will not go into great detail on functions, but simple functions can be very useful.

The general formatting of a function is functionName () { } with the function code placed within the braces. To execute a function, just do functionName. Some prefer to name all functions in all caps for improved script readability.

Here is an example of calling a function named SUBMITFUNCTION within a for loop, similar to example from the for loop section earlier:

export pdbid=2kqt 
export drugid=alm034
export reactionCoordinateStart=3
export reactionCoordinateIncrement=0.5
export reactionCoordinateStop=6
# calculate number of umbrella windows 
totalWindows=`perl -E 'say 1+($ENV{reactionCoordinateStop}-$ENV{reactionCoordinateStart})/$ENV{reactionCoordinateIncrement}'`

for (( windowCount=0; windowCount<$totalWindows; ++windowCount )); do 
	export windowCount=$windowCount
	export umbrellaWindow=`perl -E 'say ($ENV{windowCount}*$ENV{reactionCoordinateIncrement}+$ENV{reactionCoordinateStart})'`
	SUBMITFUNCTION
done	

SUBMITFUNCTION () {
	export umbrellaWindow=`printf "%07.3f" $umbrellaWindow`
	export outputPattern=${pdbid}_${drugid}_${umbrellaWindow}
	echo "outputPattern is $outputPattern"
	mkdir -p output 
	mkdir -p output/${outputPattern} 
	# etc ...
}

The output for the above snippet would be as follows:

outputPattern is 2kqt_alm034_003.000
outputPattern is 2kqt_alm034_003.500
outputPattern is 2kqt_alm034_004.000
outputPattern is 2kqt_alm034_004.500
outputPattern is 2kqt_alm034_005.000
outputPattern is 2kqt_alm034_005.500
outputPattern is 2kqt_alm034_006.000

Calling SUBMITFUNCTION within the loop improves readability of the script.

Furthermore, if you wanted to test out the script before running through the loop, you could comment out the for loop, manually enter the parameter of interest, and call SUBMITFUNCTION.

... 
# for (( windowCount=0; windowCount<$totalWindows; ++windowCount )); do 
# 	export windowCount=$windowCount
# 	export umbrellaWindow=`perl -E 'say ($ENV{windowCount}*$ENV{reactionCoordinateIncrement}+$ENV{reactionCoordinateStart})'`
# 	SUBMITFUNCTION
# done	
export umbrellaWindow=3
SUBMITFUNCTION 
...

5. Submitting

The scheduler submission script from last lab, bash.pt4_simulation, is written to utilize full GPU nodes. While you may keep this in your workflow as-is, the number of GPU nodes is very limited and it may be easier to either request portions of GPU nodes or full non-GPU nodes instead. Adjust the requested time and resources as you see fit according to available resources.

Example full GPU node request. This is what is used in lab 2. If GPU node utilization is low and you have a low volume of jobs and need them done as quickly as possible, this is a good scheme. Based on m8g architecture.

#SBATCH --time=24:00:00   # walltime
#SBATCH --ntasks=24   # number of processor cores (i.e. tasks)
#SBATCH --nodes=1   # number of nodes
#SBATCH --gres=gpu:4
#SBATCH --mem=64G   # total memory
...
# Run NAMD 
module purge
module load compiler_gnu/6.4
module load cuda/8.0
export dir=/fslhome/mgleed/software/namd/exec/NAMD_Git-2017-11-04_Linux-x86_64-multicore-CUDA
$dir/namd2 +p${SLURM_CPUS_ON_NODE} +idlepoll +devices $CUDA_VISIBLE_DEVICES $inputFile > $outputFile

Example 1 GPU per GPU node request. This requests 1/4th of the GPU node. If GPU node utilization is low and you have a medium-to-high volume of jobs, this is a good scheme. Based on m8g architecture.

#SBATCH --time=24:00:00   # walltime
#SBATCH --ntasks=6   # number of processor cores (i.e. tasks)
#SBATCH --nodes=1   # number of nodes
#SBATCH --gres=gpu:1
#SBATCH --mem=16G   # total memory
...
# Run NAMD 
module purge
module load compiler_gnu/6.4
module load cuda/8.0
export dir=/fslhome/mgleed/software/namd/exec/NAMD_Git-2017-11-04_Linux-x86_64-multicore-CUDA
$dir/namd2 +p${SLURM_CPUS_ON_NODE} +idlepoll +devices $CUDA_VISIBLE_DEVICES $inputFile > $outputFile

Example multiple non-GPU node request with mpi and infiniband. If the GPU nodes are being hogged, utilization of the non-GPU nodes is low, and you have a low volume of jobs, this is a good scheme. Jobs will run across multiple nodes. m7 has 16 cores per node, m8 and m9 have 24 cores per node. (3/2018)

#SBATCH --time=24:00:00   # walltime
#SBATCH --ntasks=64   # number of processor cores (i.e. tasks)
#SBATCH --nodes=4   # number of nodes
#SBATCH --mem-per-cpu=2G   # memory per CPU core
#SBATCH -C 'ib'
...
# Run NAMD 
module purge 
module load namd/2.12_openmpi-1.8.5_gnu-5.2.0
mpirun $(which namd2) $inputFile > $outputFile

Example single non-GPU node request. If the GPU nodes are being hogged, and you have a high volume of jobs to get through, this is a good scheme. m7 has 16 cores per node, m8 and m9 have 24 cores per node. (3/2018)

#SBATCH --time=24:00:00   # walltime
#SBATCH --ntasks=16   # number of processor cores (i.e. tasks)
#SBATCH --nodes=1   # number of nodes
#SBATCH --mem-per-cpu=2G   # memory per CPU core
...
# Run NAMD 
export dir=/fslhome/mgleed/software/namd/exec/NAMD_Git-2017-11-04_Linux-x86_64-multicore
$dir/namd2 +setcpuaffinity `numactl --show | awk '/^physcpubind/ {printf "+p%d +pemap %d",(NF-1),$2; for(i=3;i<=NF;++i){printf ",%d",$i}}'` $inputFile > $outputFile

6. Analysis with WHAM

Return to home page