Welcome to Pydna’s documentation!

Module contents

copyright: Copyright 2013-2021 by Björn Johansson. All rights reserved.
license: This code is part of the pydna package, governed by the license in LICENSE.txt that should be included as part of this package.

pydna

Pydna is a python package providing code for simulation of the creation of recombinant DNA molecules using molecular biology techniques. Development of pydna happens in this Github repository.

Provided:

PCR simulation
Assembly simulation based on shared identical sequences
Primer design for amplification of a given sequence
Automatic design of primer tails for Gibson assembly or homologous recombination.
Restriction digestion and cut&paste cloning
Agarose gel simulation
Download sequences from Genbank
Parsing various sequence formats including the capacity to handle broken Genbank format

pydna package layout

The most important modules and how to import functions or classes from them are listed below. Class names starts with a capital letter, functions with a lowercase letter:

from pydna.module import function
from pydna.module import Class

Example: from pydna.gel import Gel

pydna
   ├── amplify
   │         ├── Anneal
   │         └── pcr
   ├── assembly
   │          └── Assembly
   ├── design
   │        ├── assembly_fragments
   │        └── primer_design
   ├── download
   │          └── download_text
   ├── dseqrecord
   │            └── Dseqrecord
   ├── gel
   │     └── Gel
   ├── genbank
   │         ├── genbank
   │         └── Genbank
   ├── parsers
   │         ├── parse
   │         └── parse_primers
   └── readers
             ├── read
             └── read_primers

How to use the documentation

Documentation is available as docstrings provided in the source code for each module. These docstrings can be inspected by reading the source code directly. See further below on how to obtain the code for pydna.

In the python shell, use the built-in help function to view a function’s docstring:

>>> from pydna import readers
>>> help(readers.read)
... 

The doctrings are also used to provide an automaticly generated reference manual available online at read the docs.

Docstrings can be explored using IPython, an advanced Python shell with TAB-completion and introspection capabilities. To see which functions are available in pydna, type pydna.<TAB> (where <TAB> refers to the TAB key). Use pydna.open_config_folder?<ENTER>`to view the docstring or `pydna.open_config_folder??<ENTER> to view the source code.

In the Spyder IDE it is possible to place the cursor immediately before the name of a module,class or function and press ctrl+i to bring up docstrings in a separate window in Spyder

Code snippets are indicated by three greater-than signs:

>>> x=41
>>> x=x+1
>>> x
42

pydna source code

The pydna source code is available on Github.

How to get more help

Please join the Google group for pydna, this is the preferred location for help. If you find bugs in pydna itself, open an issue at the Github repository.

Examples of pydna in use

See this repository for a collection of: examples.

pydna.get_env()[source]

Print a an ascii table containing all environmental variables.

Pydna related variables have names that starts with pydna_

pydna.logo()[source]

Ascii-art logotype of pydna.

>>> import pydna
>>> print(pydna.logo())
                 _
                | |
 ____  _   _  __| |___   _____
|  _ \| | | |/ _  |  _ \(____ |
| |_| | |_| ( (_| | | | / ___ |
|  __/ \__  |\____|_| |_\_____|
|_|   (____/

pydna.open_cache_folder()[source]

Open the pydna cache folder.

Opens in the default file manager. The location for this folder is stored in the pydna_data_dir environmental variable.

pydna.open_config_folder()[source]

Open the pydna configuration folder.

Opens in the default file manager. The location for this folder is stored in the pydna_config_dir environmental variable.

The pydna.ini file can be edited to make pydna quicker to use. See the documentation of the :class:configparser.ConfigParser´ class.

Below is the content of a typical pydna.ini file on a Linux system.

[main]
loglevel=30
email=myemail@example.org
data_dir=/home/bjorn/.local/share/pydna
log_dir=/home/bjorn/.cache/pydna/log
ape=tclsh /home/bjorn/.ApE/AppMain.tcl
cached_funcs=Genbank_nucleotide
primers=/home/bjorn/Dropbox/wikidata/PRIMERS.txt
enzymes=/home/bjorn/Dropbox/wikidata/RestrictionEnzymes.txt

The email address is set to someone@example.com by default. If you change this to you own address, the pydna.genbank.genbank() function can be used to download sequences from Genbank directly without having to explicitly add the email address.

Pydna can cache results from the following functions or methods:

pydna.genbank.Genbank.nucleotide() Genbank_nucleotide
pydna.amplify.Anneal() amplify_Anneal
pydna.assembly.Assembly() assembly_Assembly
pydna.download.download_text() download.download_text
pydna.dseqrecord.Dseqrecord.synced() Dseqrecord_synced

These can be added separated by a comma to the cached_funcs entry in pydna.ini file or the pydna_cached_funcs environment variable.

pydna.open_current_folder()[source]

Open the current working directory.

Opens in the default file manager. The location for this folder is given by the os.getcwd() function

pydna.open_log_folder()[source]: docstring.

pydna.dseq module

Provides the Dseq class for handling double stranded DNA sequences.

Dseq is a subclass of Bio.Seq.Seq. The Dseq class is mostly useful as a part of the pydna.dseqrecord.Dseqrecord class which can hold more meta data.

The Dseq class support the notion of circular and linear DNA topology.

class pydna.dseq.Dseq(watson, crick=None, ovhg=None, linear=None, circular=None, pos=0)[source]

Bases: Seq

Dseq holds information for a double stranded DNA fragment.

Dseq also holds information describing the topology of the DNA fragment (linear or circular).

Parameters

watsonstr: a string representing the watson (sense) DNA strand.
crickstr, optional: a string representing the crick (antisense) DNA strand.
ovhgint, optional: A positive or negative number to describe the stagger between the watson and crick strands. see below for a detailed explanation.
linearbool, optional: True indicates that sequence is linear, False that it is circular.
circularbool, optional: True indicates that sequence is circular, False that it is linear.

See also

pydna.dseqrecord.Dseqrecord

Examples

Dseq is a subclass of the Biopython Seq object. It stores two strings representing the watson (sense) and crick(antisense) strands. two properties called linear and circular, and a numeric value ovhg (overhang) describing the stagger for the watson and crick strand in the 5’ end of the fragment.

The most common usage is probably to create a Dseq object as a part of a Dseqrecord object (see pydna.dseqrecord.Dseqrecord).

There are three ways of creating a Dseq object directly:

Only one argument (string):

>>> from pydna.dseq import Dseq
>>> Dseq("aaa")
Dseq(-3)
aaa
ttt

The given string will be interpreted as the watson strand of a blunt, linear double stranded sequence object. The crick strand is created automatically from the watson strand.

Two arguments (string, string):

>>> from pydna.dseq import Dseq
>>> Dseq("gggaaat","ttt")
Dseq(-7)
gggaaat
   ttt

If both watson and crick are given, but not ovhg an attempt will be made to find the best annealing between the strands. There are limitations to this. For long fragments it is quite slow. The length of the annealing sequences have to be at least half the length of the shortest of the strands.

Three arguments (string, string, ovhg=int):

The ovhg parameter is an integer describing the length of the crick strand overhang in the 5’ end of the molecule.

The ovhg parameter controls the stagger at the five prime end:

dsDNA    ovhg

  XXX    2
XXXXX

 XXXX    1
XXXXX

XXXXX    0
XXXXX

XXXXX   -1
 XXXX

XXXXX   -2
  XXX

Example of creating Dseq objects with different amounts of stagger:

>>> Dseq(watson="agt",crick="actta",ovhg=-2)
Dseq(-7)
agt
  attca
>>> Dseq(watson="agt",crick="actta",ovhg=-1)
Dseq(-6)
agt
 attca
>>> Dseq(watson="agt",crick="actta",ovhg=0)
Dseq(-5)
agt
attca
>>> Dseq(watson="agt",crick="actta",ovhg=1)
Dseq(-5)
 agt
attca
>>> Dseq(watson="agt",crick="actta",ovhg=2)
Dseq(-5)
  agt
attca

If the ovhg parameter is specified a crick strand also needs to be supplied, otherwise an exception is raised.

>>> Dseq(watson="agt", ovhg=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pydna_/dsdna.py", line 169, in __init__
    else:
ValueError: ovhg defined without crick strand!

The shape of the fragment is set by either:

linear = False, True
circular = True, False

Note that both ends of the DNA fragment has to be compatible to set circular = True (or linear = False).

>>> Dseq("aaa","ttt")
Dseq(-3)
aaa
ttt
>>> Dseq("aaa","ttt",ovhg=0)
Dseq(-3)
aaa
ttt
>>> Dseq("aaa","ttt",ovhg=1)
Dseq(-4)
 aaa
ttt
>>> Dseq("aaa","ttt",ovhg=-1)
Dseq(-4)
aaa
 ttt
>>> Dseq("aaa", "ttt", linear = False ,ovhg=0)
Dseq(o3)
aaa
ttt
>>> Dseq("aaa", "ttt", circular = True , ovhg=0)
Dseq(o3)
aaa
ttt

>>> a=Dseq("tttcccc","aaacccc")
>>> a
Dseq(-11)
    tttcccc
ccccaaa
>>> a.ovhg
4

>>> b=Dseq("ccccttt","ccccaaa")
>>> b
Dseq(-11)
ccccttt
    aaacccc
>>> b.ovhg
-4
>>>

Coercing to string

>>> str(a)
'ggggtttcccc'

A Dseq object can be longer that either the watson or crick strands.

<-- length -->
GATCCTTT
     AAAGCCTAG

<-- length -->
      GATCCTTT
AAAGCCCTA

The slicing of a linear Dseq object works mostly as it does for a string.

>>> s="ggatcc"
>>> s[2:3]
'a'
>>> s[2:4]
'at'
>>> s[2:4:-1]
''
>>> s[::2]
'gac'
>>> from pydna.dseq import Dseq
>>> d=Dseq(s, linear=True)
>>> d[2:3]
Dseq(-1)
a
t
>>> d[2:4]
Dseq(-2)
at
ta
>>> d[2:4:-1]
Dseq(-0)


>>> d[::2]
Dseq(-3)
gac
ctg

The slicing of a circular Dseq object has a slightly different meaning.

>>> s="ggAtCc"
>>> d=Dseq(s, circular=True)
>>> d
Dseq(o6)
ggAtCc
ccTaGg
>>> d[4:3]
Dseq(-5)
CcggA
GgccT

The slice [X:X] produces an empty slice for a string, while this will return the linearized sequence starting at X:

>>> s="ggatcc"
>>> d=Dseq(s, circular=True)
>>> d
Dseq(o6)
ggatcc
cctagg
>>> d[3:3]
Dseq(-6)
tccgga
aggcct
>>>

T4(nucleotides=None)[source]

Fill in five prime protruding ends and chewing back three prime protruding ends by a DNA polymerase providing both 5’-3’ DNA polymerase activity and 3’-5’ nuclease acitivty (such as T4 DNA polymerase). This can be done in presence of any combination of the four A, G, C or T. Removing one or more nucleotides can facilitate engineering of sticky ends. Default are all four nucleotides together.

Parameters

nucleotidesstr

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("gatcgatc")
>>> a
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4()
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4("t")
Dseq(-8)
gatcgat
 tagctag
>>> a.T4("a")
Dseq(-8)
gatcga
  agctag
>>> a.T4("g")
Dseq(-8)
gatcg
   gctag
>>>

cas9(RNA: str)[source]: docstring.

property circular: The circular property can not be set directly. Use looped() to create a circular Dseq object

cseguid()[source]: Circular uSEGUID (cSEGUID) for the sequence.

cut(*enzymes)[source]

Returns a list of linear Dseq fragments produced in the digestion. If there are no cuts, an empty list is returned.

Parameters

enzymesenzyme object or iterable of such objects: A Bio.Restriction.XXX restriction objects or iterable.

Returns

fragslist: list of Dseq objects formed by the digestion

Examples

>>> from pydna.dseq import Dseq
>>> seq=Dseq("ggatccnnngaattc")
>>> seq
Dseq(-15)
ggatccnnngaattc
cctaggnnncttaag
>>> from Bio.Restriction import BamHI,EcoRI
>>> type(seq.cut(BamHI))
<class 'tuple'>
>>> for frag in seq.cut(BamHI): print(repr(frag))
Dseq(-5)
g
cctag
Dseq(-14)
gatccnnngaattc
    gnnncttaag
>>> seq.cut(EcoRI, BamHI) ==  seq.cut(BamHI, EcoRI)
True
>>> a,b,c = seq.cut(EcoRI, BamHI)
>>> a+b+c
Dseq(-15)
ggatccnnngaattc
cctaggnnncttaag
>>>

cutters(batch: Optional[RestrictionBatch] = None)[source]: Enzymes in a RestrictionBatch cutting sequence at least once.

fill_in(nucleotides=None)[source]

Fill in of five prime protruding end with a DNA polymerase that has only DNA polymerase activity (such as exo-klenow 1) and any combination of A, G, C or T. Default are all four nucleotides together.

Parameters

nucleotidesstr

References

1: http://en.wikipedia.org/wiki/Klenow_fragment#The_exo-_Klenow_fragment

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.fill_in()
Dseq(-3)
aaa
ttt
>>> b=Dseq("caaa", "cttt")
>>> b
Dseq(-5)
caaa
 tttc
>>> b.fill_in()
Dseq(-5)
caaag
gtttc
>>> b.fill_in("g")
Dseq(-5)
caaag
gtttc
>>> b.fill_in("tac")
Dseq(-5)
caaa
 tttc
>>> c=Dseq("aaac", "tttg")
>>> c
Dseq(-5)
 aaac
gttt
>>> c.fill_in()
Dseq(-5)
 aaac
gttt
>>>

find(sub, start=0, end=9223372036854775807)[source]

This method behaves like the python string method of the same name.

Returns an integer, the index of the first occurrence of substring argument sub in the (sub)sequence given by [start:end].

Returns -1 if the subsequence is NOT found.

Parameters

substring or Seq object: a string or another Seq object to look for.
startint, optional: slice start.
endint, optional: slice end.

Examples

>>> from pydna.dseq import Dseq
>>> seq = Dseq("atcgactgacgtgtt")
>>> seq
Dseq(-15)
atcgactgacgtgtt
tagctgactgcacaa
>>> seq.find("gac")
3
>>> seq = Dseq(watson="agt",crick="actta",ovhg=-2)
>>> seq
Dseq(-7)
agt
  attca
>>> seq.find("taa")
2

five_prime_end()[source]

Returns a tuple describing the structure of the 5’ end of the DNA fragment

See also

pydna.dseq.Dseq.three_prime_end

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.five_prime_end()
('blunt', '')
>>> a=Dseq("aaa", "ttt", ovhg=1)
>>> a
Dseq(-4)
 aaa
ttt
>>> a.five_prime_end()
("3'", 't')
>>> a=Dseq("aaa", "ttt", ovhg=-1)
>>> a
Dseq(-4)
aaa
 ttt
>>> a.five_prime_end()
("5'", 'a')
>>>

classmethod from_representation(dsdna: str, *args, **kwargs)[source]

classmethod from_string(dna: str, *args, linear=True, circular=False, **kwargs)[source]

isblunt()[source]

isblunt.

Return True if Dseq is linear and blunt and false if staggered or circular.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("gat")
>>> a
Dseq(-3)
gat
cta
>>> a.isblunt()
True
>>> a=Dseq("gat", "atcg")
>>> a
Dseq(-4)
 gat
gcta
>>> a.isblunt()
False
>>> a=Dseq("gat", "gatc")
>>> a
Dseq(-4)
gat
ctag
>>> a.isblunt()
False
>>> a=Dseq("gat", circular=True)
>>> a
Dseq(o3)
gat
cta
>>> a.isblunt()
False

property linear: The linear property can not be set directly. Use an empty slice [:] to create a linear object.

looped()[source]

Circularized Dseq object.

This can only be done if the two ends are compatible, otherwise a TypeError is raised.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("catcgatc")
>>> a
Dseq(-8)
catcgatc
gtagctag
>>> a.looped()
Dseq(o8)
catcgatc
gtagctag
>>> a.T4("t")
Dseq(-8)
catcgat
 tagctag
>>> a.T4("t").looped()
Dseq(o7)
catcgat
gtagcta
>>> a.T4("a")
Dseq(-8)
catcga
  agctag
>>> a.T4("a").looped()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pydna/dsdna.py", line 357, in looped
    if type5 == type3 and str(sticky5) == str(rc(sticky3)):
TypeError: DNA cannot be circularized.
5' and 3' sticky ends not compatible!
>>>

lower()[source]

Return a lower case copy of the sequence.

>>> from pydna.dseq import Dseq
>>> my_seq = Dseq("aAa")
>>> my_seq
Dseq(-3)
aAa
tTt
>>> my_seq.lower()
Dseq(-3)
aaa
ttt

Returns

Dseq: Dseq object in lowercase

See also

pydna.dseq.Dseq.upper

lseguid()[source]: Linear uSEGUID (lSEGUID) for the sequence.

mung()[source]

Simulates treatment a nuclease with 5’-3’ and 3’-5’ single strand specific exonuclease activity (such as mung bean nuclease 2)

    ggatcc    ->     gatcc
     ctaggg          ctagg

     ggatcc   ->      ggatc
    tcctag            cctag

>>> from pydna.dseq import Dseq
>>> b=Dseq("caaa", "cttt")
>>> b
Dseq(-5)
caaa
 tttc
>>> b.mung()
Dseq(-3)
aaa
ttt
>>> c=Dseq("aaac", "tttg")
>>> c
Dseq(-5)
 aaac
gttt
>>> c.mung()
Dseq(-3)
aaa
ttt

References

2: http://en.wikipedia.org/wiki/Mung_bean_nuclease

mw()[source]

This method returns the molecular weight of the DNA molecule in g/mol. The following formula is used:

MW = (A x 313.2) + (T x 304.2) +
     (C x 289.2) + (G x 329.2) +
     (N x 308.9) + 79.0

n_cutters(n=3, batch: Optional[RestrictionBatch] = None)[source]: Returns the enzymes in a RestrictionBatch that cut the sequence n times.

no_cutters(batch: Optional[RestrictionBatch] = None)[source]: Enzymes in a RestrictionBatch not cutting sequence.

once_cutters(batch: Optional[RestrictionBatch] = None): Enzymes in a RestrictionBatch cutting sequence once.

property ovhg: The ovhg property. This cannot be set directly, but is a consequence of how the watson and crick strands anneal to each other

classmethod quick(watson: str, crick: str, ovhg=0, linear=True, circular=False, pos=0)[source]

rc(inplace=False)

Dseq object where watson and crick have switched places.

This represents the same double stranded sequence.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("catcgatc")
>>> a
Dseq(-8)
catcgatc
gtagctag
>>> b=a.reverse_complement()
>>> b
Dseq(-8)
gatcgatg
ctagctac
>>>

reverse_complement(inplace=False)[source]

Dseq object where watson and crick have switched places.

This represents the same double stranded sequence.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("catcgatc")
>>> a
Dseq(-8)
catcgatc
gtagctag
>>> b=a.reverse_complement()
>>> b
Dseq(-8)
gatcgatg
ctagctac
>>>

shifted(shift)[source]: Shifted version of a circular Dseq object.

t4(nucleotides=None)

Fill in five prime protruding ends and chewing back three prime protruding ends by a DNA polymerase providing both 5’-3’ DNA polymerase activity and 3’-5’ nuclease acitivty (such as T4 DNA polymerase). This can be done in presence of any combination of the four A, G, C or T. Removing one or more nucleotides can facilitate engineering of sticky ends. Default are all four nucleotides together.

Parameters

nucleotidesstr

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("gatcgatc")
>>> a
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4()
Dseq(-8)
gatcgatc
ctagctag
>>> a.T4("t")
Dseq(-8)
gatcgat
 tagctag
>>> a.T4("a")
Dseq(-8)
gatcga
  agctag
>>> a.T4("g")
Dseq(-8)
gatcg
   gctag
>>>

three_prime_end()[source]

Returns a tuple describing the structure of the 5’ end of the DNA fragment

>>> from pydna.dseq import Dseq
>>> a=Dseq("aaa", "ttt")
>>> a
Dseq(-3)
aaa
ttt
>>> a.three_prime_end()
('blunt', '')
>>> a=Dseq("aaa", "ttt", ovhg=1)
>>> a
Dseq(-4)
 aaa
ttt
>>> a.three_prime_end()
("3'", 'a')
>>> a=Dseq("aaa", "ttt", ovhg=-1)
>>> a
Dseq(-4)
aaa
 ttt
>>> a.three_prime_end()
("5'", 't')
>>>

See also

pydna.dseq.Dseq.five_prime_end

tolinear()[source]

Returns a blunt, linear copy of a circular Dseq object. This can only be done if the Dseq object is circular, otherwise a TypeError is raised.

This method is deprecated, use slicing instead. See example below.

Examples

>>> from pydna.dseq import Dseq
>>> a=Dseq("catcgatc", circular=True)
>>> a
Dseq(o8)
catcgatc
gtagctag
>>> a[:]
Dseq(-8)
catcgatc
gtagctag
>>>

transcribe()[source]

Transcribe a DNA sequence into RNA and return the RNA sequence as a new Seq object.

>>> from Bio.Seq import Seq
>>> coding_dna = Seq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
>>> coding_dna
Seq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG')
>>> coding_dna.transcribe()
Seq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG')

The sequence is modified in-place and returned if inplace is True:

>>> sequence = MutableSeq("ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
>>> sequence
MutableSeq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG')
>>> sequence.transcribe()
MutableSeq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG')
>>> sequence
MutableSeq('ATGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG')

>>> sequence.transcribe(inplace=True)
MutableSeq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG')
>>> sequence
MutableSeq('AUGGCCAUUGUAAUGGGCCGCUGAAAGGGUGCCCGAUAG')

As Seq objects are immutable, a TypeError is raised if transcribe is called on a Seq object with inplace=True.

Trying to transcribe an RNA sequence has no effect. If you have a nucleotide sequence which might be DNA or RNA (or even a mixture), calling the transcribe method will ensure any T becomes U.

Trying to transcribe a protein sequence will replace any T for Threonine with U for Selenocysteine, which has no biologically plausible rational.

>>> from Bio.Seq import Seq
>>> my_protein = Seq("MAIVMGRT")
>>> my_protein.transcribe()
Seq('MAIVMGRU')

translate(table='Standard', stop_symbol='*', to_stop=False, cds=False, gap='-')[source]

Turn a nucleotide sequence into a protein sequence by creating a new sequence object.

This method will translate DNA or RNA sequences. It should not be used on protein sequences as any result will be biologically meaningless.

Arguments:

table - Which codon table to use? This can be either a name (string), an NCBI identifier (integer), or a CodonTable object (useful for non-standard genetic codes). This defaults to the “Standard” table.
stop_symbol - Single character string, what to use for terminators. This defaults to the asterisk, “*”.
to_stop - Boolean, defaults to False meaning do a full translation continuing on past any stop codons (translated as the specified stop_symbol). If True, translation is terminated at the first in frame stop codon (and the stop_symbol is not appended to the returned protein sequence).
cds - Boolean, indicates this is a complete CDS. If True, this checks the sequence starts with a valid alternative start codon (which will be translated as methionine, M), that the sequence length is a multiple of three, and that there is a single in frame stop codon at the end (this will be excluded from the protein sequence, regardless of the to_stop option). If these tests fail, an exception is raised.
gap - Single character string to denote symbol used for gaps. Defaults to the minus sign.

A Seq object is returned if translate is called on a Seq object; a MutableSeq object is returned if translate is called pn a MutableSeq object.

e.g. Using the standard table:

>>> coding_dna = Seq("GTGGCCATTGTAATGGGCCGCTGAAAGGGTGCCCGATAG")
>>> coding_dna.translate()
Seq('VAIVMGR*KGAR*')
>>> coding_dna.translate(stop_symbol="@")
Seq('VAIVMGR@KGAR@')
>>> coding_dna.translate(to_stop=True)
Seq('VAIVMGR')

Now using NCBI table 2, where TGA is not a stop codon:

>>> coding_dna.translate(table=2)
Seq('VAIVMGRWKGAR*')
>>> coding_dna.translate(table=2, to_stop=True)
Seq('VAIVMGRWKGAR')

In fact, GTG is an alternative start codon under NCBI table 2, meaning this sequence could be a complete CDS:

>>> coding_dna.translate(table=2, cds=True)
Seq('MAIVMGRWKGAR')

It isn’t a valid CDS under NCBI table 1, due to both the start codon and also the in frame stop codons:

>>> coding_dna.translate(table=1, cds=True)
Traceback (most recent call last):
    ...
Bio.Data.CodonTable.TranslationError: First codon 'GTG' is not a start codon

If the sequence has no in-frame stop codon, then the to_stop argument has no effect:

>>> coding_dna2 = Seq("TTGGCCATTGTAATGGGCCGC")
>>> coding_dna2.translate()
Seq('LAIVMGR')
>>> coding_dna2.translate(to_stop=True)
Seq('LAIVMGR')

NOTE - Ambiguous codons like “TAN” or “NNN” could be an amino acid or a stop codon. These are translated as “X”. Any invalid codon (e.g. “TA?” or “T-A”) will throw a TranslationError.

NOTE - This does NOT behave like the python string’s translate method. For that use str(my_seq).translate(…) instead

twice_cutters(batch: Optional[RestrictionBatch] = None)[source]: Enzymes in a RestrictionBatch cutting sequence twice.

unique_cutters(batch: Optional[RestrictionBatch] = None)[source]: Enzymes in a RestrictionBatch cutting sequence once.

upper()[source]

Return an upper case copy of the sequence.

>>> from pydna.dseq import Dseq
>>> my_seq = Dseq("aAa")
>>> my_seq
Dseq(-3)
aAa
tTt
>>> my_seq.upper()
Dseq(-3)
AAA
TTT

Returns

Dseq: Dseq object in uppercase

See also

pydna.dseq.Dseq.lower

pydna.dseqrecord module

This module provides the Dseqrecord class, for handling double stranded DNA sequences. The Dseqrecord holds sequence information in the form of a pydna.dseq.Dseq object. The Dseq and Dseqrecord classes are subclasses of Biopythons Seq and SeqRecord classes, respectively.

The Dseq and Dseqrecord classes support the notion of circular and linear DNA topology.

class pydna.dseqrecord.Dseqrecord(record, *args, linear=None, circular=None, n=5e-14, **kwargs)[source]

Bases: SeqRecord

Dseqrecord is a double stranded version of the Biopython SeqRecord 3 class. The Dseqrecord object holds a Dseq object describing the sequence. Additionally, Dseqrecord hold meta information about the sequence in the from of a list of SeqFeatures, in the same way as the SeqRecord does.

The Dseqrecord can be initialized with a string, Seq, Dseq, SeqRecord or another Dseqrecord. The sequence information will be stored in a Dseq object in all cases.

Dseqrecord objects can be read or parsed from sequences in FASTA, EMBL or Genbank formats. See the pydna.readers and pydna.parsers modules for further information.

There is a short representation associated with the Dseqrecord. Dseqrecord(-3) represents a linear sequence of length 2 while Dseqrecord(o7) represents a circular sequence of length 7.

Dseqrecord and Dseq share the same concept of length. This length can be larger than each strand alone if they are staggered as in the example below.

<-- length -->
GATCCTTT
     AAAGCCTAG

Parameters

recordstring, Seq, SeqRecord, Dseq or other Dseqrecord object: This data will be used to form the seq property
circularbool, optional: True or False reflecting the shape of the DNA molecule
linearbool, optional: True or False reflecting the shape of the DNA molecule

References

3: http://biopython.org/wiki/SeqRecord

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaa")
>>> a
Dseqrecord(-3)
>>> a.seq
Dseq(-3)
aaa
ttt
>>> from pydna.seq import Seq
>>> b=Dseqrecord(Seq("aaa"))
>>> b
Dseqrecord(-3)
>>> b.seq
Dseq(-3)
aaa
ttt
>>> from Bio.SeqRecord import SeqRecord
>>> c=Dseqrecord(SeqRecord(Seq("aaa")))
>>> c
Dseqrecord(-3)
>>> c.seq
Dseq(-3)
aaa
ttt

cas9(RNA: str)[source]: docstring.

property circular: The circular property can not be set directly. Use looped() or tolinear()

cseguid()[source]

Url safe cSEGUID for a circular sequence.

Only defined for circular sequences. The cSEGUID checksum uniquely identifies a circular sequence regardless of where the origin is set. The two Dseqrecord objects below are circular permutations.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("agtatcgtacatg", circular=True)
>>> a.cseguid() # cseguid is CTJbs6Fat8kLQxHj+/SC0kGEiYs
'CTJbs6Fat8kLQxHj-_SC0kGEiYs'

>>> a=Dseqrecord("gagtatcgtacat", circular=True)
>>> a.cseguid()
'CTJbs6Fat8kLQxHj-_SC0kGEiYs'

cut(*enzymes)[source]

Digest the Dseqrecord object with one or more restriction enzymes. returns a list of linear Dseqrecords. If there are no cuts, an empty list is returned.

See also

pydna.dseq.Dseq.looped

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaa")
>>> a
Dseqrecord(-3)
>>> b=a.looped()
>>> b
Dseqrecord(o3)
>>>

lower()[source]

>>> from pydna.dseqrecord import Dseqrecord
>>> my_seq = Dseqrecord("aAa")
>>> my_seq.seq
Dseq(-3)
aAa
tTt
>>> upper = my_seq.upper()
>>> upper.seq
Dseq(-3)
AAA
TTT
>>> lower = my_seq.lower()
>>> lower
Dseqrecord(-3)
>>>

Returns

Dseqrecord: Dseqrecord object in lowercase

See also

pydna.dseqrecord.Dseqrecord.upper

lseguid()[source]

Url safe lSEGUID for a linear sequence.

Only defined for linear double stranded sequences.

The lSEGUID checksum uniquely identifies a linear sequence independent of its representation. The two Dseqrecord objects below are each others reverse complements, so they do in fact refer to the same molecule.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("agtatcgtacatg")
>>> a.lseguid()
'DPshMN4KeAjMovEjGEV4Kzj18lU'

>>> b=Dseqrecord("catgtacgatact")
>>> a.lseguid()
'DPshMN4KeAjMovEjGEV4Kzj18lU'

m()[source]: This method returns the mass of the DNA molecule in grams. This is calculated as the product between the molecular weight of the Dseq object and the

map_trace_files(pth, limit=25)[source]

n_cutters(n=3, batch: Optional[RestrictionBatch] = None)[source]: docstring.

no_cutters(batch: Optional[RestrictionBatch] = None)[source]: docstring.

number_of_cuts(*enzymes)[source]: The number of cuts by digestion with the Restriction enzymes contained in the iterable.

once_cutters(batch: Optional[RestrictionBatch] = None)[source]: docstring.

orfs(minsize=30)[source]: docstring.

rc()

Reverse complement.

See also

pydna.dseq.Dseq.reverse_complement

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>

reverse_complement()[source]

Reverse complement.

See also

pydna.dseq.Dseq.reverse_complement

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>

shifted(shift)[source]

Returns a circular Dseqrecord with a new origin <shift>. This only works on circular Dseqrecords. If we consider the following circular sequence:

GAAAT <-- watson strand

CTTTA <-- crick strand

The T and the G on the watson strand are linked together as well as the A and the C of the of the crick strand.

if shift is 1, this indicates a new origin at position 1:

new origin at the | symbol:

G|AAAT

C|TTTA

new sequence:

AAATG

TTTAC

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaat",circular=True)
>>> a
Dseqrecord(o4)
>>> a.seq
Dseq(o4)
aaat
ttta
>>> b=a.shifted(1)
>>> b
Dseqrecord(o4)
>>> b.seq
Dseq(o4)
aata
ttat

synced(**kwargs)

tolinear()[source]

Returns a linear, blunt copy of a circular Dseqrecord object. The underlying Dseq object has to be circular.

This method is deprecated, use slicing instead. See example below.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("aaa", circular = True)
>>> a
Dseqrecord(o3)
>>> b=a[:]
>>> b
Dseqrecord(-3)
>>>

twice_cutters(batch: Optional[RestrictionBatch] = None)[source]: docstring.

unique_cutters(batch: Optional[RestrictionBatch] = None)[source]: docstring.

upper()[source]

Returns an uppercase copy. >>> from pydna.dseqrecord import Dseqrecord >>> my_seq = Dseqrecord(“aAa”) >>> my_seq.seq Dseq(-3) aAa tTt >>> upper = my_seq.upper() >>> upper.seq Dseq(-3) AAA TTT >>>

Returns

Dseqrecord: Dseqrecord object in uppercase

See also

pydna.dseqrecord.Dseqrecord.lower

useguid()[source]

Url safe SEGUID for the sequence.

This checksum is the same as seguid but with base64.urlsafe encoding instead of the normal base64. This means that the characters + and / are replaced with - and _ so that the checksum can be part of a URL.

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a = Dseqrecord("aa")
>>> a.useguid() # useguid is gBw0Jp907Tg_yX3jNgS4qQWttjU
'gBw0Jp907Tg_yX3jNgS4qQWttjU'

write(filename=None, f='gb')[source]

Writes the Dseqrecord to a file using the format f, which must be a format supported by Biopython SeqIO for writing 5. Default is “gb” which is short for Genbank. Note that Biopython SeqIO reads more formats than it writes.

Filename is the path to the file where the sequece is to be written. The filename is optional, if it is not given, the description property (string) is used together with the format.

If obj is the Dseqrecord object, the default file name will be:

<obj.locus>.<f>

Where <f> is “gb” by default. If the filename already exists and AND the sequence it contains is different, a new file name will be used so that the old file is not lost:

<obj.locus>_NEW.<f>

References

5: http://biopython.org/wiki/SeqIO

pydna.amplicon module

This module provides the Amplicon class for PCR simulation. This class is not meant to be use directly but is used by the amplify module

class pydna.amplicon.Amplicon(record, *args, template=None, forward_primer=None, reverse_primer=None, **kwargs)[source]

Bases: Dseqrecord

The Amplicon class holds information about a PCR reaction involving two primers and one template. This class is used by the Anneal class and is not meant to be instantiated directly.

Parameters

forward_primerSeqRecord(Biopython): SeqRecord object holding the forward (sense) primer
reverse_primerSeqRecord(Biopython): SeqRecord object holding the reverse (antisense) primer
templateDseqrecord: Dseqrecord object holding the template (circular or linear)

dbd_program()[source]

figure()[source]

This method returns a simple figure of the two primers binding to a part of the template.

5tacactcaccgtctatcattatc...cgactgtatcatctgatagcac3
                           ||||||||||||||||||||||
                          3gctgacatagtagactatcgtg5
5tacactcaccgtctatcattatc3
 |||||||||||||||||||||||
3atgtgagtggcagatagtaatag...gctgacatagtagactatcgtg5

Returns

figure:string: A string containing a text representation of the primers annealing on the template (see example above).

classmethod from_SeqRecord(record, *args, path=None, **kwargs)[source]

primers()[source]

program()[source]

rc()

Reverse complement.

See also

pydna.dseq.Dseq.reverse_complement

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>

reverse_complement()[source]

Reverse complement.

See also

pydna.dseq.Dseq.reverse_complement

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>

set_forward_primer_footprint(length)[source]

set_reverse_primer_footprint(length)[source]

pydna.amplify module

This module provide the Anneal class and the pcr() function for PCR simulation. The pcr function is simpler to use, but expects only one PCR product. The Anneal class should be used if more flexibility is required.

Primers with 5’ tails as well as inverse PCR on circular templates are handled correctly.

class pydna.amplify.Anneal(**kwargs)[source]

Bases: object

The Anneal class has the following important attributes:

Attributes

forward_primerslist: Description of forward_primers.
reverse_primerslist: Description of reverse_primers.
templateDseqrecord: A copy of the template argument. Primers annealing sites has been added as features that can be visualized in a seqence editor such as ApE.
limitint, optional: The limit of PCR primer annealing, default is 13 bp.

property products

report(): returns a short report describing if or where primer anneal on the template.

pydna.amplify.pcr(*args, **kwargs)[source]

pcr is a convenience function for the Anneal class to simplify its usage, especially from the command line. If more than one or no PCR product is formed, a ValueError is raised.

args is any iterable of Dseqrecords or an iterable of iterables of Dseqrecords. args will be greedily flattened.

Parameters

argsiterable containing sequence objects: Several arguments are also accepted.
limitint = 13, optional: limit length of the annealing part of the primers.

Returns

productAmplicon: An pydna.amplicon.Amplicon object representing the PCR product. The direction of the PCR product will be the same as for the template sequence.

Notes

sequences in args could be of type:

string
Seq
SeqRecord (or subclass)
Dseqrecord (or sublcass)

The last sequence will be assumed to be the template while all preceeding sequences will be assumed to be primers.

This is a powerful function, use with care!

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.readers import read
>>> from pydna.amplify import pcr
>>> from pydna.primer import Primer
>>> template = Dseqrecord("tacactcaccgtctatcattatctactatcgactgtatcatctgatagcac")
>>> from Bio.SeqRecord import SeqRecord
>>> p1 = Primer("tacactcaccgtctatcattatc")
>>> p2 = Primer("cgactgtatcatctgatagcac").reverse_complement()
>>> pcr(p1, p2, template)
Amplicon(51)
>>> pcr([p1, p2], template)
Amplicon(51)
>>> pcr((p1,p2,), template)
Amplicon(51)
>>>

pydna.assembly module

Assembly of sequences by homologous recombination.

Should also be useful for related techniques such as Gibson assembly and fusion PCR. Given a list of sequences (Dseqrecords), all sequences are analyzed for shared homology longer than the set limit.

A graph is constructed where each overlapping region form a node and sequences separating the overlapping regions form edges.

            -- A --
catgatctacgtatcgtgt     -- B --
            atcgtgtactgtcatattc
                        catattcaaagttct



--x--> A --y--> B --z-->   (Graph)

Nodes:

A : atcgtgt
B : catattc

Edges:

x : catgatctacgt
y : actgt
z : aaagttct

The NetworkX package is used to trace linear and circular paths through the graph.

class pydna.assembly.Assembly(**kwargs)[source]

Bases: object

Assembly of a list of linear DNA fragments into linear or circular constructs. The Assembly is meant to replace the Assembly method as it is easier to use. Accepts a list of Dseqrecords (source fragments) to initiate an Assembly object. Several methods are available for analysis of overlapping sequences, graph construction and assembly.

Parameters

fragmentslist: a list of Dseqrecord objects.
limitint, optional: The shortest shared homology to be considered
algorithmfunction, optional: The algorithm used to determine the shared sequences.
max_nodesint: The maximum number of nodes in the graph. This can be tweaked to manage sequences with a high number of shared sub sequences.

Examples

>>> from pydna.assembly import Assembly
>>> from pydna.dseqrecord import Dseqrecord
>>> a = Dseqrecord("acgatgctatactgCCCCCtgtgctgtgctcta")
>>> b = Dseqrecord("tgtgctgtgctctaTTTTTtattctggctgtatc")
>>> c = Dseqrecord("tattctggctgtatcGGGGGtacgatgctatactg")
>>> x = Assembly((a,b,c), limit=14)
>>> x
Assembly
fragments....: 33bp 34bp 35bp
limit(bp)....: 14
G.nodes......: 6
algorithm....: common_sub_strings
>>> x.assemble_circular()
[Contig(o59), Contig(o59)]
>>> x.assemble_circular()[0].seq.watson
'acgatgctatactgCCCCCtgtgctgtgctctaTTTTTtattctggctgtatcGGGGGt'

assemble_circular()[source]

assemble_linear(start=None, end=None, max_nodes=None)[source]

pydna.common_sub_strings module

This module is based on the Py-rstr-max package that was written by Romain Brixtel (rbrixtel_at_gmail_dot_com) (https://brixtel.users.greyc.fr) and is available from https://code.google.com/p/py-rstr-max https://github.com/gip0/py-rstr-max the original code was covered by an MIT licence.

class pydna.common_sub_strings.Rstr_max[source]

Bases: object

add_str(str_unicode)[source]

go()[source]

removeMany(stack, results, m, idxEnd)[source]

step1_sort_suffix()[source]

step2_lcp()[source]

step3_rstr()[source]

pydna.common_sub_strings.common_sub_strings(stringx: str, stringy: str, limit=25)[source]

Finds all common substrings between stringx and stringy longer than limit. This function is case sensitive. The substrings may overlap.

returns a list of tuples describing the substrings The list is sorted longest -> shortest.

Parameters

stringxstr
stringystr
limitint, optional

Returns

list of tuple

[(startx1,starty1,length1),(startx2,starty2,length2), …]

startx1 = startposition in x, where substring 1 starts starty1 = position in y where substring 1 starts length1 = lenght of substring

Examples

>>> from pydna.common_sub_strings import common_sub_strings
>>> common_sub_strings("gatgatttcggtagtta", "gtcagtatgtctatctatcgcg", limit=3)
[(1, 6, 3), (7, 17, 3), (10, 4, 3), (12, 3, 3)]

Overlaps   Symbols
(1, 6,  3)   ---
(7, 17, 3)   +++
(10, 4, 3)   ...
(12, 3, 3)   ===

            ===
gatgatttcggtagtta           stringx
 ---   +++...

    ...
gtcagtatgtctatctatcgcg      stringy
   ===---        +++

pydna.common_sub_strings.direct_kark_sort(s)[source]

pydna.common_sub_strings.kark_sort(s, SA, n, K)[source]

pydna.common_sub_strings.radixpass(a, b, r, s, n, k)[source]

pydna.common_sub_strings.terminal_overlap(stringx: str, stringy: str, limit=15)[source]

Finds the the flanking common substrings between stringx and stringy longer than limit. This means that the results only contains substrings that starts or ends at the the ends of stringx and stringy.

This function is case sensitive.

returns a list of tuples describing the substrings The list is sorted longest -> shortest.

Parameters

stringxstr
stringystr
limitint, optional

Returns

list of tuple

[(startx1,starty1,length1),(startx2,starty2,length2), …]

startx1 = startposition in x, where substring 1 starts starty1 = position in y where substring 1 starts length1 = lenght of substring

Examples

>>> from pydna.common_sub_strings import terminal_overlap
>>> terminal_overlap("agctatgtatcttgcatcgta", "gcatcgtagtctatttgcttac", limit=8)
[(13, 0, 8)]

             <-- 8 ->
<---- 13 --->
agctatgtatcttgcatcgta                    stringx
             gcatcgtagtctatttgcttac      stringy
             0

pydna.contig module

class pydna.contig.Contig(record, *args, graph=None, nodemap=None, **kwargs)[source]

Bases: Dseqrecord

This class holds information about a DNA assembly. This class is instantiated by the Assembly class and is not meant to be used directly.

detailed_figure()[source]

Returns a text representation of the assembled fragments.

Linear:

acgatgctatactgCCCCCtgtgctgtgctcta
                   TGTGCTGTGCTCTA
                   tgtgctgtgctctaTTTTTtattctggctgtatc

Circular:

||||||||||||||
acgatgctatactgCCCCCtgtgctgtgctcta
                   TGTGCTGTGCTCTA
                   tgtgctgtgctctaTTTTTtattctggctgtatc
                                      TATTCTGGCTGTATC
                                      tattctggctgtatcGGGGGtacgatgctatactg
                                                           ACGATGCTATACTG

figure()[source]

Compact ascii representation of the assembled fragments.

Each fragment is represented by:

Size of common 5' substring|Name and size of DNA fragment|
Size of common 5' substring

Linear:

frag20| 6
       \\/
       /\\
        6|frag23| 6
                 \\/
                 /\\
                  6|frag14

Circular:

 -|2577|61
|       \\/
|       /\\
|       61|5681|98
|               \\/
|               /\\
|               98|2389|557
|                       \\/
|                       /\\
|                       557-
|                          |
 --------------------------

classmethod from_SeqRecord(record, *args, graph=None, nodemap=None, **kwargs)[source]

classmethod from_string(record: str = '', *args, graph=None, nodemap=None, **kwargs)[source]: docstring.

rc()

Reverse complement.

See also

pydna.dseq.Dseq.reverse_complement

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>

reverse_complement()[source]

Reverse complement.

See also

pydna.dseq.Dseq.reverse_complement

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>

pydna.design module

This module contain functions for primer design for various purposes.

:func:primer_design for designing primers for a sequence or a matching primer for an existing primer. Returns an Amplicon object (same as the amplify module returns).
:func:assembly_fragments Adds tails to primers for a linear assembly through homologous recombination or Gibson assembly.
:func:circular_assembly_fragments Adds tails to primers for a circular assembly through homologous recombination or Gibson assembly.

pydna.design.assembly_fragments(f, overlap=35, maxlink=40)[source]

This function return a list of pydna.amplicon.Amplicon objects where primers have been modified with tails so that the fragments can be fused in the order they appear in the list by for example Gibson assembly or homologous recombination.

Given that we have two linear pydna.amplicon.Amplicon objects a and b

we can modify the reverse primer of a and forward primer of b with tails to allow fusion by fusion PCR, Gibson assembly or in-vivo homologous recombination. The basic requirements for the primers for the three techniques are the same.

                          <-->

 _________ a _________           __________ b ________
/                     \         /                     \
agcctatcatcttggtctctgca         TTTATATCGCATGACTCTTCTTT
|||||||||||||||||||||||         |||||||||||||||||||||||
                 <gacgt                          <AGAAA
agcct>                          TTTAT>
|||||||||||||||||||||||         |||||||||||||||||||||||
tcggatagtagaaccagagacgt         AAATATAGCGTACTGAGAAGAAA

     agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
     ||||||||||||||||||||||||||||||||||||||||||||||
     tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA
     \___________________ c ______________________/

Design tailed primers incorporating a part of the next or previous fragment to be assembled.

agcctatcatcttggtctctgca
|||||||||||||||||||||||
                gagacgtAAATATA

|||||||||||||||||||||||
tcggatagtagaaccagagacgt

                       TTTATATCGCATGACTCTTCTTT
                       |||||||||||||||||||||||

                ctctgcaTTTATAT
                       |||||||||||||||||||||||
                       AAATATAGCGTACTGAGAAGAAA

PCR products with flanking sequences are formed in the PCR process.

agcctatcatcttggtctctgcaTTTATAT
||||||||||||||||||||||||||||||
tcggatagtagaaccagagacgtAAATATA
                \____________/

                   identical
                   sequences
                 ____________
                /            \
                ctctgcaTTTATATCGCATGACTCTTCTTT
                ||||||||||||||||||||||||||||||
                gagacgtAAATATAGCGTACTGAGAAGAAA

The fragments can be fused by any of the techniques mentioned earlier to form c:

agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
||||||||||||||||||||||||||||||||||||||||||||||
tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA

The first argument of this function is a list of sequence objects containing Amplicons and other similar objects.

At least every second sequence object needs to be an Amplicon

This rule exists because if a sequence object is that is not a PCR product is to be fused with another fragment, that other fragment needs to be an Amplicon so that the primer of the other object can be modified to include the whole stretch of sequence homology needed for the fusion. See the example below where a is a non-amplicon (a linear plasmid vector for instance)

 _________ a _________           __________ b ________
/                     \         /                     \
agcctatcatcttggtctctgca   <-->  TTTATATCGCATGACTCTTCTTT
|||||||||||||||||||||||         |||||||||||||||||||||||
tcggatagtagaaccagagacgt                          <AGAAA
                                TTTAT>
                                |||||||||||||||||||||||
                          <-->  AAATATAGCGTACTGAGAAGAAA

     agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
     ||||||||||||||||||||||||||||||||||||||||||||||
     tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA
     \___________________ c ______________________/

In this case only the forward primer of b is fitted with a tail with a part a:

agcctatcatcttggtctctgca
|||||||||||||||||||||||
tcggatagtagaaccagagacgt

                       TTTATATCGCATGACTCTTCTTT
                       |||||||||||||||||||||||
                                        <AGAAA
         tcttggtctctgcaTTTATAT
                       |||||||||||||||||||||||
                       AAATATAGCGTACTGAGAAGAAA

PCR products with flanking sequences are formed in the PCR process.

agcctatcatcttggtctctgcaTTTATAT
||||||||||||||||||||||||||||||
tcggatagtagaaccagagacgtAAATATA
                \____________/

                   identical
                   sequences
                 ____________
                /            \
                ctctgcaTTTATATCGCATGACTCTTCTTT
                ||||||||||||||||||||||||||||||
                gagacgtAAATATAGCGTACTGAGAAGAAA

The fragments can be fused by for example Gibson assembly:

agcctatcatcttggtctctgcaTTTATAT
||||||||||||||||||||||||||||||
tcggatagtagaacca

                             TCGCATGACTCTTCTTT
                ||||||||||||||||||||||||||||||
                gagacgtAAATATAGCGTACTGAGAAGAAA

to form c:

agcctatcatcttggtctctgcaTTTATATCGCATGACTCTTCTTT
||||||||||||||||||||||||||||||||||||||||||||||
tcggatagtagaaccagagacgtAAATATAGCGTACTGAGAAGAAA

The first argument of this function is a list of sequence objects containing Amplicons and other similar objects.

The overlap argument controls how many base pairs of overlap required between adjacent sequence fragments. In the junction between Amplicons, tails with the length of about half of this value is added to the two primers closest to the junction.

>       <
Amplicon1
         Amplicon2
         >       <

         ⇣

>       <-
Amplicon1
         Amplicon2
        ->       <

In the case of an Amplicon adjacent to a Dseqrecord object, the tail will be twice as long (1*overlap) since the recombining sequence is present entirely on this primer:

Dseqrecd1
         Amplicon1
         >       <

         ⇣

Dseqrecd1
         Amplicon1
       -->       <

Note that if the sequence of DNA fragments starts or stops with an Amplicon, the very first and very last prinmer will not be modified i.e. assembles are always assumed to be linear. There are simple tricks around that for circular assemblies depicted in the last two examples below.

The maxlink arguments controls the cut off length for sequences that will be synhtesized by adding them to primers for the adjacent fragment(s). The argument list may contain short spacers (such as spacers between fusion proteins).

Example 1: Linear assembly of PCR products (pydna.amplicon.Amplicon class objects) ------

>       <         >       <
Amplicon1         Amplicon3
         Amplicon2         Amplicon4
         >       <         >       <

                     ⇣
                     pydna.design.assembly_fragments
                     ⇣

>       <-       ->       <-                      pydna.assembly.Assembly
Amplicon1         Amplicon3
         Amplicon2         Amplicon4     ➤  Amplicon1Amplicon2Amplicon3Amplicon4
        ->       <-       ->       <

Example 2: Linear assembly of alternating Amplicons and other fragments

>       <         >       <
Amplicon1         Amplicon2
         Dseqrecd1         Dseqrecd2

                     ⇣
                     pydna.design.assembly_fragments
                     ⇣

>       <--     -->       <--                     pydna.assembly.Assembly
Amplicon1         Amplicon2
         Dseqrecd1         Dseqrecd2     ➤  Amplicon1Dseqrecd1Amplicon2Dseqrecd2

Example 3: Linear assembly of alternating Amplicons and other fragments

Dseqrecd1         Dseqrecd2
         Amplicon1         Amplicon2
         >       <       -->       <

                     ⇣
             pydna.design.assembly_fragments
                     ⇣
                                                  pydna.assembly.Assembly
Dseqrecd1         Dseqrecd2
         Amplicon1         Amplicon2     ➤  Dseqrecd1Amplicon1Dseqrecd2Amplicon2
       -->       <--     -->       <

Example 4: Circular assembly of alternating Amplicons and other fragments

                 ->       <==
Dseqrecd1         Amplicon2
         Amplicon1         Dseqrecd1
       -->       <-
                     ⇣
                     pydna.design.assembly_fragments
                     ⇣
                                                   pydna.assembly.Assembly
                 ->       <==
Dseqrecd1         Amplicon2                    -Dseqrecd1Amplicon1Amplicon2-
         Amplicon1                       ➤    |                             |
       -->       <-                            -----------------------------

------ Example 5: Circular assembly of Amplicons

>       <         >       <
Amplicon1         Amplicon3
         Amplicon2         Amplicon1
         >       <         >       <

                     ⇣
                     pydna.design.assembly_fragments
                     ⇣

>       <=       ->       <-
Amplicon1         Amplicon3
         Amplicon2         Amplicon1
        ->       <-       +>       <

                     ⇣
             make new Amplicon using the Amplicon1.template and
             the last fwd primer and the first rev primer.
                     ⇣
                                                   pydna.assembly.Assembly
+>       <=       ->       <-
 Amplicon1         Amplicon3                  -Amplicon1Amplicon2Amplicon3-
          Amplicon2                      ➤   |                             |
         ->       <-                          -----------------------------

Parameters

flist of pydna.amplicon.Amplicon and other Dseqrecord like objects: list Amplicon and Dseqrecord object for which fusion primers should be constructed.
overlapint, optional: Length of required overlap between fragments.
maxlinkint, optional: Maximum length of spacer sequences that may be present in f. These will be included in tails for designed primers.

Returns

seqslist of pydna.amplicon.Amplicon and other Dseqrecord like objects pydna.amplicon.Amplicon objects

[Amplicon1,
 Amplicon2, ...]

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.design import primer_design
>>> a=primer_design(Dseqrecord("atgactgctaacccttccttggtgttgaacaagatcgacgacatttcgttcgaaacttacgatg"))
>>> b=primer_design(Dseqrecord("ccaaacccaccaggtaccttatgtaagtacttcaagtcgccagaagacttcttggtcaagttgcc"))
>>> c=primer_design(Dseqrecord("tgtactggtgctgaaccttgtatcaagttgggtgttgacgccattgccccaggtggtcgtttcgtt"))
>>> from pydna.design import assembly_fragments
>>> # We would like a circular recombination, so the first sequence has to be repeated
>>> fa1,fb,fc,fa2 = assembly_fragments([a,b,c,a])
>>> # Since all fragments are Amplicons, we need to extract the rp of the 1st and fp of the last fragments.
>>> from pydna.amplify import pcr
>>> fa = pcr(fa2.forward_primer, fa1.reverse_primer, a)
>>> [fa,fb,fc]
[Amplicon(100), Amplicon(101), Amplicon(102)]
>>> fa.name, fb.name, fc.name = "fa fb fc".split()
>>> from pydna.assembly import Assembly
>>> assemblyobj = Assembly([fa,fb,fc])
>>> assemblyobj
Assembly
fragments....: 100bp 101bp 102bp
limit(bp)....: 25
G.nodes......: 6
algorithm....: common_sub_strings
>>> assemblyobj.assemble_linear()
[Contig(-231), Contig(-166), Contig(-36)]
>>> assemblyobj.assemble_circular()[0].cseguid()
'V3Mi8zilejgyoH833UbjJOtDMbc'
>>> (a+b+c).looped().cseguid()
'V3Mi8zilejgyoH833UbjJOtDMbc'
>>> print(assemblyobj.assemble_circular()[0].figure())
 -|fa|36
|     \/
|     /\
|     36|fb|36
|           \/
|           /\
|           36|fc|36
|                 \/
|                 /\
|                 36-
|                    |
 --------------------
>>>

pydna.design.circular_assembly_fragments(f, overlap=35, maxlink=40)[source]

pydna.design.primer_design(template, fp=None, rp=None, limit=13, target_tm=55.0, tm_func=<function tm_default>, **kwargs)[source]

This function designs a forward primer and a reverse primer for PCR amplification of a given template sequence.

The template argument is a Dseqrecord object or equivalent containing the template sequence.

The optional fp and rp arguments can contain an existing primer for the sequence (either the forward or reverse primer). One or the other primers can be specified, not both (since then there is nothing to design!, use the pydna.amplify.pcr function instead).

If one of the primers is given, the other primer is designed to match in terms of Tm. If both primers are designed, they will be designed to target_tm

tm_func is a function that takes an ascii string representing an oligonuceotide as argument and returns a float. Some useful functions can be found in the pydna.tm module, but can be substituted for a custom made function.

The function returns a pydna.amplicon.Amplicon class instance. This object has the object.forward_primer and object.reverse_primer properties which contain the designed primers.

Parameters

templatepydna.dseqrecord.Dseqrecord: a Dseqrecord object. The only required argument.
fp, rppydna.primer.Primer, optional: optional pydna.primer.Primer objects containing one primer each.
target_tmfloat, optional: target tm for the primers, set to 55°C by default.
tm_funcfunction: Function used for tm calculation. This function takes an ascii string representing an oligonuceotide as argument and returns a float. Some useful functions can be found in the pydna.tm module, but can be substituted for a custom made function.

Returns

resultAmplicon

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> t=Dseqrecord("atgactgctaacccttccttggtgttgaacaagatcgacgacatttcgttcgaaacttacgatg")
>>> t
Dseqrecord(-64)
>>> from pydna.design import primer_design
>>> ampl = primer_design(t)
>>> ampl
Amplicon(64)
>>> ampl.forward_primer
f64 17-mer:5'-atgactgctaacccttc-3'
>>> ampl.reverse_primer
r64 19-mer:5'-catcgtaagtttcgaacga-3'
>>> print(ampl.figure())
5atgactgctaacccttc...tcgttcgaaacttacgatg3
                     |||||||||||||||||||
                    3agcaagctttgaatgctac5
5atgactgctaacccttc3
 |||||||||||||||||
3tactgacgattgggaag...agcaagctttgaatgctac5
>>> pf = "GGATCC" + ampl.forward_primer
>>> pr = "GGATCC" + ampl.reverse_primer
>>> pf
f64 23-mer:5'-GGATCCatgactgct..ttc-3'
>>> pr
r64 25-mer:5'-GGATCCcatcgtaag..cga-3'
>>> from pydna.amplify import pcr
>>> pcr_prod = pcr(pf, pr, t)
>>> print(pcr_prod.figure())
      5atgactgctaacccttc...tcgttcgaaacttacgatg3
                           |||||||||||||||||||
                          3agcaagctttgaatgctacCCTAGG5
5GGATCCatgactgctaacccttc3
       |||||||||||||||||
      3tactgacgattgggaag...agcaagctttgaatgctac5
>>> print(pcr_prod.seq)
GGATCCatgactgctaacccttccttggtgttgaacaagatcgacgacatttcgttcgaaacttacgatgGGATCC
>>> from pydna.primer import Primer
>>> pf = Primer("atgactgctaacccttccttggtgttg", id="myprimer")
>>> ampl = primer_design(t, fp = pf)
>>> ampl.forward_primer
myprimer 27-mer:5'-atgactgctaaccct..ttg-3'
>>> ampl.reverse_primer
r64 37-mer:5'-catcgtaagtttcga..gtt-3'

pydna.download module

Provides a function for downloading online text files

pydna.editor module

This module provides a class for opening a sequence using an editor that accepts a file as a command line argument.

ApE - A plasmid Editor 6 is and excellent editor for this purpose.

References

6: http://biologylabs.utah.edu/jorgensen/wayned/ape/

class pydna.editor.Editor(shell_command_for_editor, tmpdir=None)[source]

Bases: object

The Editor class needs to be instantiated before use.

Parameters

shell_command_for_editorstr: String containing the path to the editor
tmpdirstr, optional: String containing path to the temprary directory where sequence files are stored before opening.

Examples

>>> import pydna
>>> #ape = pydna.Editor("tclsh8.6 /home/bjorn/.ApE/apeextractor/ApE.vfs/lib/app-AppMain/AppMain.tcl")
>>> #ape.open("aaa") # This command opens the sequence in the ApE editor

open(seq_to_open)[source]

Open a sequence for editing in an external (DNA) editor.

Parameters

argsSeqRecord or Dseqrecord object

pydna.editor.ape(*args, **kwargs)[source]: docstring.

pydna.gel module

docstring.

pydna.gel.gel(samples=None, gel_length=600, margin=50, interpolator=<scipy.interpolate._cubic.CubicSpline object>)[source]: docstring.

pydna.gel.interpolator(mwstd)[source]: docstring.

pydna.genbank module

This module provides a class for downloading sequences from genbank called Genbank and an function that does the same thing called genbank.

The function can be used if the environmental variable pydna_email has been set to a valid email address. The easiest way to do this permanantly is to edit the pydna.ini file. See the documentation of pydna.open_config_folder()

class pydna.genbank.Genbank(users_email: str, *args, tool='pydna', **kwargs)[source]

Bases: object

Class to facilitate download from genbank. It is easier and quicker to use the pydna.genbank.genbank() function directly.

Parameters

users_emailstring: Has to be a valid email address. You should always tell Genbanks who you are, so that they can contact you.

Examples

>>> from pydna.genbank import Genbank
>>> gb=Genbank("bjornjobb@gmail.com")
>>> rec = gb.nucleotide("LP002422.1")   # <- entry from genbank
>>> print(len(rec))
1

nucleotide(**kwargs)

pydna.genbank.genbank(accession: str = 'CS570233.1', *args, **kwargs)[source]

Download a genbank nuclotide record. This function takes the same paramenters as the :func:pydna.genbank.Genbank.nucleotide method. The email address stored in the pydna_email environment variable is used. The easiest way set this permanantly is to edit the pydna.ini file. See the documentation of pydna.open_config_folder()

if no accession is given, a very short Genbank entry is used as an example (see below). This can be useful for testing the connection to Genbank.

Please note that this result is also cached by default by settings in the pydna.ini file. See the documentation of pydna.open_config_folder()

LOCUS CS570233 14 bp DNA linear PAT 18-MAY-2007 DEFINITION Sequence 6 from Patent WO2007025016. ACCESSION CS570233 VERSION CS570233.1 KEYWORDS . SOURCE synthetic construct

ORGANISM synthetic construct
other sequences; artificial sequences.

REFERENCE 1

AUTHORS Shaw,R.W. and Cottenoir,M. TITLE Inhibition of metallo-beta-lactamase by double-stranded dna JOURNAL Patent: WO 2007025016-A1 6 01-MAR-2007;

Texas Tech University System (US)

FEATURES Location/Qualifiers

source 1..14: /organism=”synthetic construct” /mol_type=”unassigned DNA” /db_xref=”taxon:32630” /note=”This is a 14bp aptamer inhibitor.”

ORIGIN

1 atgttcctac atga

//

pydna.genbankfile module

class pydna.genbankfile.GenbankFile(record, *args, path=None, **kwargs)[source]

Bases: Dseqrecord

classmethod from_SeqRecord(record, *args, path=None, **kwargs)[source]

rc()

Reverse complement.

See also

pydna.dseq.Dseq.reverse_complement

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>

reverse_complement()[source]

Reverse complement.

See also

pydna.dseq.Dseq.reverse_complement

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>

pydna.genbankfixer module

This module provides the gbtext_clean() function which can clean up broken Genbank files enough to pass the BioPython Genbank parser

Almost all of this code was lifted from BioJSON (https://github.com/levskaya/BioJSON) by Anselm Levskaya. The original code was not accompanied by any software licence. This parser is based on pyparsing.

There are some modifications to deal with fringe cases.

The parser first produces JSON as an intermediate format which is then formatted back into a string in Genbank format.

The parser is not complete, so some fields do not survive the roundtrip (see below). This should not be a difficult fix. The returned result has two properties, .jseq which is the intermediate JSON produced by the parser and .gbtext which is the formatted genbank string.

pydna.genbankfixer.concat_dict(dlist)[source]: more or less dict(list of string pairs) but merges vals with the same keys so no duplicates occur

pydna.genbankfixer.gbtext_clean(gbtext)[source]

This function takes a string containing one genbank sequence in Genbank format and returns a named tuple containing two fields, the gbtext containing a string with the corrected genbank sequence and jseq which contains the JSON intermediate.

Examples

>>> s = '''LOCUS       New_DNA      3 bp    DNA   CIRCULAR SYN        19-JUN-2013
... DEFINITION  .
... ACCESSION
... VERSION
... SOURCE      .
...   ORGANISM  .
... COMMENT
... COMMENT     ApEinfo:methylated:1
... ORIGIN
...         1 aaa
... //'''
>>> from pydna.readers import read
>>> read(s)  
/home/bjorn/anaconda3/envs/bjorn36/lib/python3.6/site-packages/Bio/GenBank/Scanner.py:1388: BiopythonParserWarning: Malformed LOCUS line found - is this correct?
:'LOCUS       New_DNA      3 bp    DNA   CIRCULAR SYN        19-JUN-2013\n'
  "correct?\n:%r" % line, BiopythonParserWarning)
Traceback (most recent call last):
  File "/home/bjorn/python_packages/pydna/pydna/readers.py", line 48, in read
    results = results.pop()
IndexError: pop from empty list

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bjorn/python_packages/pydna/pydna/readers.py", line 50, in read
    raise ValueError("No sequences found in data:\n({})".format(data[:79]))
ValueError: No sequences found in data:
(LOCUS       New_DNA      3 bp    DNA   CIRCULAR SYN        19-JUN-2013
DEFINITI)
>>> from pydna.genbankfixer import gbtext_clean
>>> s2, j2 = gbtext_clean(s)
>>> print(s2)
LOCUS       New_DNA                    3 bp ds-DNA     circular SYN 19-JUN-2013
DEFINITION  .
ACCESSION
VERSION
SOURCE      .
ORGANISM  .
COMMENT
COMMENT     ApEinfo:methylated:1
FEATURES             Location/Qualifiers
ORIGIN
        1 aaa
//
>>> s3 = read(s2)
>>> s3
Dseqrecord(o3)
>>> print(s3.format())
LOCUS       New_DNA                    3 bp    DNA     circular SYN 19-JUN-2013
DEFINITION  .
ACCESSION   New_DNA
VERSION     New_DNA
KEYWORDS    .
SOURCE
  ORGANISM  .
            .
COMMENT
            ApEinfo:methylated:1
FEATURES             Location/Qualifiers
ORIGIN
        1 aaa
//

pydna.genbankfixer.locstr(locs, strand)[source]: genbank formatted location string, assumes no join’d combo of rev and fwd seqs

pydna.genbankfixer.originstr(sequence)[source]: formats dna sequence as broken, numbered lines ala genbank

pydna.genbankfixer.parseGBLoc(s, l, t)[source]: retwingles parsed genbank location strings, assumes no joins of RC and FWD sequences

pydna.genbankfixer.strip_indent(str)[source]

pydna.genbankfixer.strip_multiline(s, l, t)[source]

pydna.genbankfixer.toGB(jseq)[source]: parses json jseq data and prints out ApE compatible genbank

pydna.genbankfixer.toInt(s, l, t)[source]

pydna.genbankfixer.toJSON(gbkstring)[source]

pydna.genbankfixer.wrapstring(str_, rowstart, rowend, padfirst=True)[source]: wraps the provided string in lines of length rowend-rowstart and padded on the left by rowstart. -> if padfirst is false the first line is not padded

pydna.genbankrecord module

class pydna.genbankrecord.GenbankRecord(record, *args, item='accession', start=None, stop=None, strand=1, **kwargs)[source]

Bases: Dseqrecord

biopython_code()[source]: docstring.

classmethod from_SeqRecord(record, *args, item='accession', start=None, stop=None, strand=1, **kwargs)[source]

classmethod from_string(record: str = '', *args, item='accession', start=None, stop=None, strand=1, **kwargs)[source]: docstring.

pydna_code()[source]: docstring.

rc()

Reverse complement.

See also

pydna.dseq.Dseq.reverse_complement

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>

reverse_complement()[source]

Reverse complement.

See also

pydna.dseq.Dseq.reverse_complement

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> a=Dseqrecord("ggaatt")
>>> a
Dseqrecord(-6)
>>> a.seq
Dseq(-6)
ggaatt
ccttaa
>>> a.reverse_complement().seq
Dseq(-6)
aattcc
ttaagg
>>>

pydna.myprimers module

Provides a practical way to access a list of primer sequences in a text file.

The path of a text file can be specified in the pydna.ini file or by the ´pydna_primers´ environment variable.

The file is expected to contain sequences in FASTA, Genbank or EMBL formats or any format readable by the parse_primers function.

The primer list is expected to follow the convension below. The primer name is expected to begin with the number.

can have the format below for example:

>2_third_primer
tgagtagtcgtagtcgtcgtat

>1_second_primer
tgatcgtcatgctgactatactat

>0_first_primer
ctaggatcgtagatctagctg
...

The primerlist funtion returns a list of pydna.primer.Primer objects primerdict returns a dict where the key is the id of the object.

class pydna.myprimers.PrimerList(initlist: ~typing.Iterable = None, path: (<class 'str'>, <class 'pathlib.Path'>) = None, *args, identifier: str = 'p', **kwargs)[source]

Bases: UserList

Read a text file with primers.

The primers can be of any format readable by the parse_primers function. Lines beginning with # are ignored. Path defaults to the path given by the pydna_primers environment variable.

The primer list does not accept new primers. Use the assign_numbers_to_new_primers method and paste the new primers at the top of the list.

The primer list remembers the numbers of accessed primers. The indices of accessed primers are stored in the .accessed property.

property accessed: docstring.

assign_numbers(lst: list)[source]

Find new primers in lst.

Returns a string containing new primers with their assigned numbers. This string can be copied and pasted to the primer text file.

code(lst: list): Pydna code for a list of primer objects.

open_folder()[source]: Open folder where primer file is located.

pydna_code_from_list(lst: list)[source]: Pydna code for a list of primer objects.

pydna.myprimers.check_primer_numbers(pl: Optional[list] = None)[source]: Find primers whose number do not match position in list.

pydna.myprimers.find_duplicate_primers(pl: Optional[list] = None)[source]: Find a list of lists with duplicated primer sequences.

pydna.myprimers.undefined_sequence(pl: Optional[list] = None)[source]: Primers in list with N or n instead of a sequence.

pydna.parsers module

Provides two functions, parse and parse_primers

pydna.parsers.parse(data, ds=True)[source]

Return all DNA sequences found in data.

If no sequences are found, an empty list is returned. This is a greedy function, use carefully.

Parameters

datastring or iterable

The data parameter is a string containing:

an absolute path to a local file. The file will be read in text mode and parsed for EMBL, FASTA and Genbank sequences. Can be a string or a Path object.
a string containing one or more sequences in EMBL, GENBANK, or FASTA format. Mixed formats are allowed.
data can be a list or other iterable where the elements are 1 or 2

dsbool

If True double stranded Dseqrecord objects are returned. If False single stranded Bio.SeqRecord 7 objects are returned.

Returns

list: contains Dseqrecord or SeqRecord objects

See also

read

References

7: http://biopython.org/wiki/SeqRecord

pydna.parsers.parse_primers(data)[source]

pydna.primer module

This module provide the Primer class that is a subclass of the biopython SeqRecord.

class pydna.primer.Primer(record, *args, amplicon=None, position=None, footprint=0, **kwargs)[source]

Bases: SeqRecord

Primer and its position on a template, footprint and tail.

property footprint

reverse_complement(*args, **kwargs)[source]: Return the reverse complement of the sequence.

property tail

pydna.readers module

Provides two functions, read and read_primer.

pydna.readers.read(data, ds=True)[source]

This function is similar the parse() function but expects one and only one sequence or and exception is thrown.

Parameters

datastring: see below
dsbool: Double stranded or single stranded DNA, if True return Dseqrecord objects, else Bio.SeqRecord objects.

Returns

Dseqrecord: contains the first Dseqrecord or SeqRecord object parsed.

See also

parse

Notes

The data parameter is similar to the data parameter for parse().

pydna.readers.read_primer(data)[source]: Use this function to read a primer sequence from a string or a local file. The usage is similar to the parse_primer() function.

pydna.seqrecord module

A subclass of the Biopython SeqRecord class.

Has a number of extra methods and uses the pydna._pretty_str.pretty_str class instread of str for a nicer output in the IPython shell.

class pydna.seqrecord.SeqRecord(*args, id='id', name='name', description='description', **kwargs)[source]

Bases: SeqRecord

A subclass of the Biopython SeqRecord class.

Has a number of extra methods and uses the pydna._pretty_str.pretty_str class instread of str for a nicer output in the IPython shell.

property accession: Alias for id property.

add_colors_to_features_for_ape()[source]

Assign colors to features.

compatible with the ApE editor.

add_feature(x=None, y=None, seq=None, type_='misc', strand=1, *args, **kwargs)[source]

Add a feature of type misc to the feature list of the sequence.

Parameters

xint: Indicates start of the feature
yint: Indicates end of the feature

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a=SeqRecord("atgtaa")
>>> a.features
[]
>>> a.add_feature(2,4)
>>> a.features
[SeqFeature(SimpleLocation(ExactPosition(2),
                           ExactPosition(4),
                           strand=1),
            type='misc',
            qualifiers=...)]

cai(organism='sce')[source]: docstring.

comment(newcomment='')[source]: docstring.

copy()[source]: docstring.

datefunction()[source]: docstring.

property definition: Alias for description property.

dump(filename, protocol=None)[source]: docstring.

express(organism='sce')[source]: docstring.

extract_feature(n)[source]

Extract feature and return a new SeqRecord object.

Parameters

nint
Indicates the feature to extract

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a=SeqRecord("atgtaa")
>>> a.add_feature(2,4)
>>> b=a.extract_feature(0)
>>> b
SeqRecord(seq=Seq('gt'), id='ft2', name='part_name',
          description='description', dbxrefs=[])

classmethod from_Bio_SeqRecord(sr: SeqRecord)[source]: Creates a pydnaSeqRecord from a Biopython SeqRecord.

gc()[source]: Return GC content.

isorf(table=1)[source]

Detect if sequence is an open reading frame (orf) in the 5’-3’.

direction.

Translation tables are numbers according to the NCBI numbering 8.

Parameters

tableint: Sets the translation table, default is 1 (standard code)

Returns

bool: True if sequence is an orf, False otherwise.

References

8: http://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=c

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a=SeqRecord("atgtaa")
>>> a.isorf()
True
>>> b=SeqRecord("atgaaa")
>>> b.isorf()
False
>>> c=SeqRecord("atttaa")
>>> c.isorf()
False

lcs(other, *args, limit=25, **kwargs)[source]

Return the longest common substring between the sequence.

and another sequence (other). The other sequence can be a string, Seq, SeqRecord, Dseq or DseqRecord. The method returns a SeqFeature with type “read” as this method is mostly used to map sequence reads to the sequence. This can be changed by passing a type as keyword with some other string value.

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a = SeqRecord("GGATCC")
>>> a.lcs("GGATCC", limit=6)
SeqFeature(SimpleLocation(ExactPosition(0),
                          ExactPosition(6), strand=1),
                          type='read',
                          qualifiers=...)
>>> a.lcs("GATC", limit=4)
SeqFeature(SimpleLocation(ExactPosition(1),
                          ExactPosition(5), strand=1),
                          type='read',
                          qualifiers=...)
>>> a = SeqRecord("CCCCC")
>>> a.lcs("GGATCC", limit=6)
SeqFeature(None)

list_features()[source]

Print ASCII table with all features.

Examples

>>> from pydna.seq import Seq
>>> from pydna.seqrecord import SeqRecord
>>> a=SeqRecord(Seq("atgtaa"))
>>> a.add_feature(2,4)
>>> print(a.list_features())
+-----+---------------+-----+-----+-----+-----+------+------+
| Ft# | Label or Note | Dir | Sta | End | Len | type | orf? |
+-----+---------------+-----+-----+-----+-----+------+------+
|   0 | L:ft2         | --> | 2   | 4   |   2 | misc |  no  |
+-----+---------------+-----+-----+-----+-----+------+------+

property locus: Alias for name property.

rarecodons(organism='sce')[source]: docstring.

reverse_complement(*args, **kwargs)[source]: Return the reverse complement of the sequence.

sorted_features()[source]

Return a list of the features sorted by start position.

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a=SeqRecord("atgtaa")
>>> a.add_feature(3,4)
>>> a.add_feature(2,4)
>>> print(a.features)
[SeqFeature(SimpleLocation(ExactPosition(3), ExactPosition(4),
                           strand=1),
            type='misc', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(2), ExactPosition(4),
                           strand=1),
            type='misc', qualifiers=...)]
>>> print(a.sorted_features())
[SeqFeature(SimpleLocation(ExactPosition(2), ExactPosition(4),
                           strand=1),
            type='misc', qualifiers=...),
 SeqFeature(SimpleLocation(ExactPosition(3), ExactPosition(4),
                           strand=1),
            type='misc', qualifiers=...)]

stamp(algorithm, now=<function SeqRecord.datefunction>, tool='pydna', separator=' ', comment='')[source]

Add a uSEGUID or cSEGUID checksum.

The checksum is stored in object.annotations[“comment”]. This shows in the COMMENTS section of a formatted genbank file.

For blunt linear sequences:

SEGUID <seguid>

For circular sequences:

cSEGUID <seguid>

Fore linear sequences which are not blunt:

lSEGUID <seguid>

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a = SeqRecord("aa")
>>> a.stamp("uSEGUID")
'gBw0Jp907Tg_yX3jNgS4qQWttjU'
>>> a.annotations["comment"][:41]
'pydna uSEGUID gBw0Jp907Tg_yX3jNgS4qQWttjU'

startcodon(organism='sce')[source]: docstring.

stopcodon(organism='sce')[source]: docstring.

useguid()[source]

Return the url safe SEGUID 9 for the sequence.

This checksum is the same as seguid but with base64.urlsafe encoding instead of the normal base 64. This means that the characters + and / are replaced with - and _ so that the checksum can be a part of and URL or a filename.

References

9: http://wiki.christophchamp.com/index.php/SEGUID

Examples

>>> from pydna.seqrecord import SeqRecord
>>> a=SeqRecord("aaaaaaa")
>>> a.useguid() # original seguid is +bKGnebMkia5kNg/gF7IORXMnIU
'-bKGnebMkia5kNg_gF7IORXMnIU'

pydna.tm module

This module provide functions for melting temperature calculations.

pydna.tm.Q5(primer: str, *args, **kwargs)[source]: For Q5 Ta they take the lower of the two Tms and add 1C (up to 72C). For Phusion they take the lower of the two and add 3C (up to 72C).

pydna.tm.dbd_program(amplicon, tm=<function tm_dbd>, ta=<function ta_dbd>)[source]

Text representation of a suggested PCR program.

Using a polymerase with a DNA binding domain such as Pfu-Sso7d.

|98°C|98°C               |    |tmf:53.8
|____|_____          72°C|72°C|tmr:54.8
|30s |10s  \ 57.0°C _____|____|15s/kb
|    |      \______/ 0:15|5min|GC 51%
|    |       10s         |    |1051bp

|98°C|98°C      |    |tmf:82.5
|____|____      |    |tmr:84.4
|30s |10s \ 72°C|72°C|15s/kb
|    |     \____|____|GC 52%
|    |      3:45|5min|15058bp

pydna.tm.pfu_sso7d_program(amplicon, tm=<function tm_dbd>, ta=<function ta_dbd>)

Text representation of a suggested PCR program.

Using a polymerase with a DNA binding domain such as Pfu-Sso7d.

|98°C|98°C               |    |tmf:53.8
|____|_____          72°C|72°C|tmr:54.8
|30s |10s  \ 57.0°C _____|____|15s/kb
|    |      \______/ 0:15|5min|GC 51%
|    |       10s         |    |1051bp

|98°C|98°C      |    |tmf:82.5
|____|____      |    |tmr:84.4
|30s |10s \ 72°C|72°C|15s/kb
|    |     \____|____|GC 52%
|    |      3:45|5min|15058bp

pydna.tm.program(amplicon, tm=<function tm_default>, ta=<function ta_default>)[source]

Returns a string containing a text representation of a suggested PCR program using Taq or similar polymerase.

|95°C|95°C               |    |tmf:59.5
|____|_____          72°C|72°C|tmr:59.7
|3min|30s  \ 59.1°C _____|____|60s/kb
|    |      \______/ 0:32|5min|GC 51%
|    |       30s         |    |1051bp

pydna.tm.ta_dbd(fp, rp, seq, tm=<function tm_dbd>, tm_product=None)[source]

pydna.tm.ta_default(fp: str, rp: str, seq: str, tm=<function tm_default>, tm_product=<function tm_product>)[source]

Ta calculation.

according to:

Rychlik, Spencer, and Rhoads, 1990, Optimization of the anneal ing temperature for DNA amplification in vitro http://www.ncbi.nlm.nih.gov/pubmed/2243783

The formula described uses the length and GC content of the product and salt concentration (monovalent cations)

pydna.tm.taq_program(amplicon, tm=<function tm_default>, ta=<function ta_default>)

Returns a string containing a text representation of a suggested PCR program using Taq or similar polymerase.

|95°C|95°C               |    |tmf:59.5
|____|_____          72°C|72°C|tmr:59.7
|3min|30s  \ 59.1°C _____|____|60s/kb
|    |      \______/ 0:32|5min|GC 51%
|    |       30s         |    |1051bp

pydna.tm.tm_dbd(seq, check=True, strict=True, c_seq=None, shift=0, nn_table={'AA/TT': (-7.9, -22.2), 'AT/TA': (-7.2, -20.4), 'CA/GT': (-8.5, -22.7), 'CG/GC': (-10.6, -27.2), 'CT/GA': (-7.8, -21.0), 'GA/CT': (-8.2, -22.2), 'GC/CG': (-9.8, -24.4), 'GG/CC': (-8.0, -19.9), 'GT/CA': (-8.4, -22.4), 'TA/AT': (-7.2, -21.3), 'init': (0, 0), 'init_5T/A': (0, 0), 'init_A/T': (2.3, 4.1), 'init_G/C': (0.1, -2.8), 'init_allA/T': (0, 0), 'init_oneG/C': (0, 0), 'sym': (0, -1.4)}, tmm_table=None, imm_table=None, de_table=None, dnac1=250, dnac2=250, selfcomp=False, Na=50, K=0, Tris=0, Mg=1.5, dNTPs=0.8, saltcorr=1, func=<function Tm_NN>)[source]

pydna.tm.tm_default(seq, check=True, strict=True, c_seq=None, shift=0, nn_table={'AA/TT': (-7.6, -21.3), 'AT/TA': (-7.2, -20.4), 'CA/GT': (-8.5, -22.7), 'CG/GC': (-10.6, -27.2), 'CT/GA': (-7.8, -21.0), 'GA/CT': (-8.2, -22.2), 'GC/CG': (-9.8, -24.4), 'GG/CC': (-8.0, -19.0), 'GT/CA': (-8.4, -22.4), 'TA/AT': (-7.2, -20.4), 'init': (0.2, -5.7), 'init_5T/A': (0, 0), 'init_A/T': (2.2, 6.9), 'init_G/C': (0, 0), 'init_allA/T': (0, 0), 'init_oneG/C': (0, 0), 'sym': (0, -1.4)}, tmm_table=None, imm_table=None, de_table=None, dnac1=250.0, dnac2=250.0, selfcomp=False, Na=40, K=0, Tris=75.0, Mg=1.5, dNTPs=0.8, saltcorr=7, func=<function Tm_NN>)[source]

pydna.tm.tm_product(seq: str, K=0.05)[source]

Tm calculation for the amplicon.

according to:

Rychlik, Spencer, and Rhoads, 1990, Optimization of the anneal ing temperature for DNA amplification in vitro http://www.ncbi.nlm.nih.gov/pubmed/2243783

pydna.tm.tmbresluc(primer: str, *args, primerc=500.0, saltc=50, **kwargs)[source]

Returns the tm for a primer using a formula adapted to polymerases with a DNA binding domain, such as the Phusion polymerase.

Parameters

primerstring: primer sequence 5’-3’
primercfloat: primer concentration in nM), set to 500.0 nm by default.
saltcfloat, optional: Monovalent cation concentration in mM, set to 50.0 mM by default.
thermodynamicsbool, optional: prints details of the thermodynamic data to stdout. For debugging only.

Returns

tmfloat: the tm of the primer

pydna.utils module

This module provides miscellaneous functions.

pydna.utils.SmallestRotation(s)[source]

Find the rotation of s that is smallest in lexicographic order.

Algorithm according to Duval 1983:

Pierre Duval, Jean. 1983. “Factorizing Words over an Ordered Alphabet.” Journal of Algorithms & Computational Technology* 4 (4) (December 1): 363–381.

Algorithms on strings and sequences based on Lyndon words. David Eppstein, October 2011.

https://gist.github.com/dvberkel/1950267

pydna.utils.cai(seq: str, organism: str = 'sce')[source]: docstring.

pydna.utils.complement(sequence: str)[source]: returns the complement of sequence (string) accepts mixed DNA/RNA

pydna.utils.cseguid(seq: str) → pretty_str[source]

Url safe cSEGUID for a string representing a circular double stranded DNA molecule.

The cSEGUID is the SEGUID checksum calculated for the lexicographically minimal string rotation of a DNA sequence. Only defined for circular sequences.

Examples

>>> from pydna.utils import cseguid
>>> cseguid("attt")
'oopV-6158nHJqedi8lsshIfcqYA'
>>> cseguid("ttta")
'oopV-6158nHJqedi8lsshIfcqYA'

pydna.utils.eq(*args, **kwargs)[source]

Compares two or more DNA sequences for equality i.e. they represent the same DNA molecule. Comparisons are case insensitive.

Parameters

argsiterable: iterable containing sequences args can be strings, Biopython Seq or SeqRecord, Dseqrecord or dsDNA objects.
circularbool, optional: Consider all molecules circular or linear
linearbool, optional: Consider all molecules circular or linear

Returns

eqbool: Returns True or False

Notes

Compares two or more DNA sequences for equality i.e. if they represent the same DNA molecule.

Two linear sequences are considiered equal if either:

They have the same sequence (case insensitive)
One sequence is the reverse complement of the other (case insensitive)

Two circular sequences are considered equal if they are circular permutations:

They have the same lengt, AND
One sequence or can be found in the concatenation of the other sequence with it , OR
The reverse complement can be found in the concatenation of the other sequence with itself.

The topology for the comparison can be set using one of the keywords linear or circular to True or False.

If circular or linear is not set, it will be deduced from the topology of each sequence for sequences that have a linear or circular attribute (like Dseq and Dseqrecord).

Examples

>>> from pydna.dseqrecord import Dseqrecord
>>> from pydna.utils import eq
>>> eq("aaa","AAA")
True
>>> eq("aaa","AAA","TTT")
True
>>> eq("aaa","AAA","TTT","tTt")
True
>>> eq("aaa","AAA","TTT","tTt", linear=True)
True
>>> eq("Taaa","aTaa", linear = True)
False
>>> eq("Taaa","aTaa", circular = True)
True
>>> a=Dseqrecord("Taaa")
>>> b=Dseqrecord("aTaa")
>>> eq(a,b)
False
>>> eq(a,b,circular=True)
True
>>> a=a.looped()
>>> b=b.looped()
>>> eq(a,b)
True
>>> eq(a,b,circular=False)
False
>>> eq(a,b,linear=True)
False
>>> eq(a,b,linear=False)
True
>>> eq("ggatcc","GGATCC")
True
>>> eq("ggatcca","GGATCCa")
True
>>> eq("ggatcca","tGGATCC")
True

pydna.utils.expandtolist(content)[source]

pydna.utils.express(seq: str, organism='sce')[source]: docstring.

pydna.utils.flatten(*args)[source]

Flattens an iterable of iterables.

Down to str, bytes, bytearray or any of the pydna or Biopython seq objects

pydna.utils.identifier_from_string(s: str) → str[source]

Return a valid python identifier.

based on the argument s or an empty string

pydna.utils.join_list_to_table(rawlist)[source]

pydna.utils.lseguid_blunt(seq: str) → pretty_str[source]

lSEGUID checksum.

for a string representing a blunt double stranded DNA molecule.

Examples

>>> from pydna.utils import lseguid_blunt
>>> lseguid_blunt("ttt")
'YG7G6b2Kj_KtFOX63j8mRHHoIlE'
>>> lseguid_blunt("aaa")
'YG7G6b2Kj_KtFOX63j8mRHHoIlE'

pydna.utils.lseguid_sticky(watson: str, crick: str, overhang: int) → pretty_str[source]

Linear SEGUID (lSEGUID) checksum.

Calculates the lSEGUID checksum for a double stranded DNA sequence described by two strings (watson and crick) representing the two complementary DNA strands and an integer describing the stagger between the two strands in the 5’ end.

The overhang is defined as the amount of 3’ overhang in the 5’ side of the molecule. A molecule with 5’ overhang has a negative value.

dsDNA ovhg

nnn… 2

nnnnn…

nnnn… 1

nnnnn…

nnnnn… 0 nnnnn…

nnnnn… -1
nnnn…

nnnnn… -2
nnn…

pydna.utils.memorize(filename)[source]: Decorator for caching fucntions and classes, see pydna.download and pydna.Assembly for use

pydna.utils.open_folder(pth)[source]: docstring.

pydna.utils.parse_text_table(rawtable, tabs=4)[source]

pydna.utils.randomDNA(length, maxlength=None)[source]: string!

pydna.utils.randomORF(length, maxlength=None)[source]

pydna.utils.randomRNA(length, maxlength=None)[source]

pydna.utils.randomprot(length, maxlength=None)[source]

pydna.utils.rarecodons(seq: str, organism='sce')[source]: docstring.

pydna.utils.rc(sequence: str)[source]: returns the reverse complement of sequence (string) accepts mixed DNA/RNA

pydna.utils.seguid(seq: str) → pretty_str[source]

SEGUID checksum for a string representing a biological sequence.

This is the SEGUID checksum with the standard Base64 encoding that can contain ‘+’ and ‘/’ as originally defined by Babnigg and Giometti:

Babnigg, Giometti 2006. “A Database of Unique Protein Sequence Identifiers for Proteome Studies.” Proteomics 6 (16) (August): 4514–4522.

Examples

>>> from pydna.utils import seguid
>>> seguid("aaa")
'YG7G6b2Kj/KtFOX63j8mRHHoIlE'

pydna.utils.seq31(seq)[source]

Turn a three letter code protein sequence into one with one letter code.

The single input argument ‘seq’ should be a protein sequence using single letter codes, as a python string.

This function returns the amino acid sequence as a string using the one letter amino acid codes. Output follows the IUPAC standard (including ambiguous characters B for “Asx”, J for “Xle” and X for “Xaa”, and also U for “Sel” and O for “Pyl”) plus “Ter” for a terminator given as an asterisk.

Any unknown character (including possible gap characters), is changed into ‘Xaa’.

Examples

>>> from Bio.SeqUtils import seq3
>>> seq3("MAIVMGRWKGAR*")
'MetAlaIleValMetGlyArgTrpLysGlyAlaArgTer'
>>> from pydna.utils import seq31
>>> seq31('MetAlaIleValMetGlyArgTrpLysGlyAlaArgTer')
'M  A  I  V  M  G  R  W  K  G  A  R  *'

pydna.utils.useguid(seq: str) → pretty_str[source]

Returns the url safe SEGUID checksum for the sequence. This is the SEGUID checksum with the ‘+’ and ‘/’ characters of standard Base64 encoding are respectively replaced by ‘-’ and ‘_’.

Examples

>>> from pydna.utils import useguid
>>> useguid("aaa")
'YG7G6b2Kj_KtFOX63j8mRHHoIlE'

Welcome to Pydna’s documentation!

Module contents

pydna

pydna package layout

How to use the documentation

pydna source code

How to get more help

Examples of pydna in use

pydna.dseq module

pydna.dseqrecord module

pydna.amplicon module

pydna.amplify module

pydna.assembly module

pydna.common_sub_strings module

pydna.contig module

pydna.design module

pydna.download module

pydna.editor module

References

pydna.gel module

pydna.genbank module

pydna.genbankfile module

pydna.genbankfixer module

pydna.genbankrecord module

pydna.myprimers module

pydna.parsers module

pydna.primer module

pydna.readers module

pydna.seqrecord module

pydna.tm module

pydna.utils module

Indices and tables