This lecture is about how to efficiently combine Numpy and Cython to write fast numerical code.

We will focus on the problem of computing the *standard deviation* of a list of floating point numbers. Let $x_1, \ldots, x_n$ be a list of $n$ real numbers, and let $$\mu = \frac{1}{n}\sum_{i=1}^n x_i$$ be their mean. We define their standard deviation to be $$\sqrt{\frac{1}{n}\sum_{i=1}^n (x_i - \mu)^2}.$$

**Note**: In statistics it is common to divide by $n-1$ instead of $n$ when computing the standard deviation of a sample and using it to estimate the standard deviation of a population. We will not do this, since our goal today is illustrating programming techniques, not learning techniques of statistics.

**Running Example:** Compute the standard deviation of a list of 64-bit floating point numbers. Our data set is computed using the random.random method, which generates numbers **uniformly** between 0 and 1.

First we write a naive straightforward implementation of computation of the standard deviation.

{{{id=9| def my_std(z): mean = sum(z)/len(z) return sqrt(sum((x-mean)^2 for x in z)/len(z)) /// }}} {{{id=10| time my_std(v) /// 0.28871143425255896 Time: CPU 0.06 s, Wall: 0.06 s }}} {{{id=11| timeit('my_std(v)', number=10) /// 10 loops, best of 3: 64.8 ms per loop }}}Next we try the std function in Sage, which was implemented by UW undergrad Andrew Hou as part of paid work he did on Sage after he took Math 480.

{{{id=1| time std(v, bias=True) /// 0.28871143425255896 Time: CPU 0.03 s, Wall: 0.03 s }}} {{{id=12| timeit('std(v, bias=True)') /// 25 loops, best of 3: 26.4 ms per loop }}}Next we try Numpy, which is much faster than the above.

{{{id=7| import numpy v_numpy = numpy.array(v, dtype=numpy.float64) /// }}} {{{id=14| v_numpy.dtype /// dtype('float64') }}} {{{id=21| v_numpy.std() /// 0.28871143425255896 }}} {{{id=6| timeit('v_numpy.std()') /// 625 loops, best of 3: 1.25 ms per loop }}} {{{id=76| 22.5/1.25 /// 18.0000000000000 }}}Sage also has code for working with TimeSeries, which happens to have a method for computing the standard deviation. It is a couple of times faster than Numpy.

{{{id=16| v_stats = stats.TimeSeries(v) /// }}} {{{id=85| v_stats.variance?? /// }}} {{{id=20| v_stats.standard_deviation(bias=True) /// 0.28871143425255896 }}} {{{id=15| timeit('v_stats.standard_deviation(bias=True)') /// 625 loops, best of 3: 240 µs per loop }}}The TimeSeries code is nearly optimal. A TimeSeries is represented by a contiguous array of double's, and the code for computing standard deviation is very straightforward Cython that maps directly to C. (I wrote it, by the way.)

{{{id=17| 1.25/.236 /// 5.29661016949153 }}}**Goal: **Write a function that computes the standard deviation of a numpy array as quickly as stats.TimeSeries does, hence is faster than Numpy itself.

First approach: Use numpy "vectorized operations". This doesn't help at all (and is also wasteful of memory, by the way).

{{{id=86| def std_numpy1_oneline(v): return math.sqrt(((v - v.mean())**2).mean()) /// }}} {{{id=23| def std_numpy1(v): m = v.mean() # mean of entries w = v - m # subtracts m from each entry: "broadcasting" w2 = w**2 # squares each entry componentwise. return math.sqrt(w2.mean()) /// }}} {{{id=88| get_memory_usage() /// 864.90625 }}} {{{id=89| w = v_numpy**2 /// }}} {{{id=90| get_memory_usage() /// 865.671875 }}} {{{id=19| std_numpy1(v_numpy) /// 0.28871143425255896 }}} {{{id=87| std_numpy1_oneline(v_numpy) /// 0.28871143425255896 }}} {{{id=18| timeit('std_numpy1(v_numpy)') /// 625 loops, best of 3: 1.25 ms per loop }}}Let's see how the time gets spent between each step. It turns out to be about equally spent among each line.

{{{id=34| m = v_numpy.mean() timeit('v_numpy.mean()') /// 625 loops, best of 3: 140 µs per loop }}} {{{id=37| w = v_numpy - m timeit('v_numpy - m') /// 625 loops, best of 3: 241 µs per loop }}} {{{id=38| w2 = w**2 timeit('w**2') /// 625 loops, best of 3: 157 µs per loop }}} {{{id=36| m2 = w2.mean() timeit('math.sqrt(w2.mean())') /// 625 loops, best of 3: 143 µs per loop }}} {{{id=91| sqrt(2) /// sqrt(2) }}} {{{id=92| math.sqrt(2) /// 1.4142135623730951 }}} {{{id=93| a = float(2) timeit('sqrt(a)', number=10^5) /// 100000 loops, best of 3: 589 ns per loop }}} {{{id=94| timeit('math.sqrt(a)', number=10^5) /// 100000 loops, best of 3: 216 ns per loop }}} {{{id=39| /// }}}Next try Cython with no special type declarations. Not surprisingly, this does not help in the least bit.

{{{id=28| %cython import math def std_numpy2(v): m = v.mean() # mean of entries w = v - m # subtracts m from each entry: "broadcasting" w2 = w**2 # squares each entry componentwise. return math.sqrt(w2.mean()) /// }}} {{{id=25| std_numpy2(v_numpy) /// 0.28871143425255896 }}} {{{id=24| timeit('std_numpy2(v_numpy)') /// 625 loops, best of 3: 1.3 ms per loop }}}Next try Cython with special support for Numpy. This gets powerful... as we will see.

{{{id=30| %cython from numpy cimport ndarray import math def std_numpy3(ndarray v not None): m = v.mean() # mean of entries w = v - m # subtracts m from each entry: "broadcasting" w2 = w**2 # squares each entry componentwise. return math.sqrt(w2.mean()) /// }}} {{{id=96| std_numpy3(None) /// Traceback (most recent call last): File "Look at Cython + Numpy documentation (by Googling "cython numpy"), and we learn that if we declare v a little more precisely, then we get fast direct access to the underlying elements in v.

{{{id=46| %cython cimport numpy as alex import math def std_numpy4a(alex.ndarray[alex.float64_t, ndim=1] v not None): cdef Py_ssize_t i cdef Py_ssize_t n = v.shape[0] # how many entries # Compute the mean cdef double m # = 0 for i in range(n): m += v[i] m /= n # just doing the mean for now... return m /// }}} {{{id=45| std_numpy4a(v_numpy) /// 0.49896857465357608 }}}Timing looks good...

{{{id=44| timeit('std_numpy4a(v_numpy)') /// 625 loops, best of 3: 376 µs per loop }}} {{{id=79| /// }}}Let's finish it the function and see how it compares.

{{{id=56| %cython cimport numpy as np cdef extern: double sqrt(double) def std_numpy4b(np.ndarray[np.float64_t, ndim=1] v): cdef Py_ssize_t i cdef Py_ssize_t n = v.shape[0] # how many entries # Compute the mean cdef double m = 0 for i in range(n): m += v[i] m /= n # Compute variance cdef double s = 0 for i in range(n): s += (v[i] - m)**2 return sqrt(s/n) /// }}} {{{id=55| std_numpy4b(v_numpy) /// 0.28871143425255896 }}} {{{id=63| timeit('std_numpy4b(v_numpy)') /// 625 loops, best of 3: 274 µs per loop }}} {{{id=54| timeit('v_stats.standard_deviation(bias=True)') /// 625 loops, best of 3: 238 µs per loop }}} {{{id=58| timeit('v_numpy.std()') /// 625 loops, best of 3: 1.27 ms per loop }}}Very nice!!

{{{id=60| /// }}}Finally, we try again, after disabling bounds checking. This is even better; almost as good as stats.TimeSeries.

{{{id=50| %cython cimport numpy as np cdef extern: double sqrt(double) # turn of bounds-checking for entire function cimport cython @cython.boundscheck(False) def std_numpy5a(np.ndarray[np.float64_t, ndim=1] v): cdef Py_ssize_t i cdef Py_ssize_t n = v.shape[0] # how many entries # Compute the mean cdef double m = 0 for i in range(n): m += v[i] m /= n # Compute variance cdef double s = 0 for i in range(n): s += (v[i] - m)**2 return sqrt(s/n) /// }}} {{{id=49| timeit('std_numpy5a(v_numpy)') /// 625 loops, best of 3: 227 µs per loop }}} {{{id=43| timeit('v_stats.standard_deviation(bias=True)') /// 625 loops, best of 3: 240 µs per loop }}}For smaller input, interestingly we get a massive win over Numpy. If you were, e.g., computing a sliding window of standard deviations (say) for a time series, this would be important.

{{{id=65| a = numpy.array([1,2,3,4], dtype=float); a /// array([ 1., 2., 3., 4.]) }}} {{{id=67| timeit('std_numpy5a(a)') /// 625 loops, best of 3: 483 ns per loop }}} {{{id=68| timeit('a.std()') /// 625 loops, best of 3: 24.4 µs per loop }}} {{{id=69| b = stats.TimeSeries(a) timeit('b.standard_deviation(bias=True)') /// 625 loops, best of 3: 534 ns per loop }}} \end{verbatim} \chapter{Resources for Solving Problems Using Sage} \section{The Sage Library} You can do a Google search on all of the Sage documentation, web pages and discussion groups all in one go by visiting the webpage \url{http://sagemath.org/search.html} and typing in your search, then waiting as the page dynamically updates. Of course you can find links to the standard Sage documentation, including the tutorial, constructions guide, FAQ, developer's guide, and reference manual at \url{http://sagemath.org/help.html}. There are also links to videos and many other helpful materials there. There are numerous quick reference cards at \url{http://wiki.sagemath.org/quickref} which list numerous Sage commands on a single page in specific areas such as Calculus and Linear Algebra. Much work went into creating these cards, and they are an excellent resource to print out. If you want to search the documentation of the functions defined in the Sage library, use the \verb|search_doc| command. This just does a straight search through all the docstrings of the functions in the HTML version of the Sage documentation, without any prebuilt index. It is written in Python and uses regularly expressions on the source code to extract docstrings out and find your search terms. The \verb|search_doc| command works on both the command line and in the notebook. On the command line it displays one line from each HTML document, so is tedious to actually use. In the notebook, it displays the relevant html documents, which links to each. If you click on a link, you'll go to an interactive version of the relevant section of the documentation, where you can search that page for relevant text. Watch out, since stupidly you then have to use the back button to get back to your worksheet -- it would be better if the html output of \verb|search_doc| used \verb||; until this is changed, you may want to right click and select "open in new tab". \begin{lstlisting} sage: search_doc('eigenvalue') ... \end{lstlisting} The HTML documentation for Sage is far from complete; there is lots of code in the Sage library that isn't documented at all in the HTML documentation of Sage, for whatever reason. You can easily search through all of this code by typing \verb|search_src(...)| on either the command line or in the notebook. \begin{lstlisting} sage: search_src('eigenvalue') ... \end{lstlisting} On the command line you'll get a list of each line in each file that contains the given search term. In the notebook, you will get a list of all files in the Sage library that contain the search term, along with links to the files. The same caveats regarding clicking on the links applies as with \verb|search_doc| (see above). When you click on a file, it will look funny for a moment (a bug, in my opinion), then suddenly refresh and display as a very nicely formated and syntax highlighted page. You should then search this page for your term, in order to see it in context. At the top of the page there is also a link called ``browse directory'', which lets you browse to any file in the Sage library and similarly view it. To search the definitions of function, use \verb|search_def|. This works just like \verb|search_src| but restricts the search to the definition lines of functions. \begin{lstlisting} sage: search_def('other_graph') ... \end{lstlisting} \section{Question and Answer Sites} The Sage project hosts their own question and answer site devoted to Sage at \url{http://ask.sagemath.org}. You can instantly sign in using OpenID and ask a question, or answer one. Specific answerable questions are best. You can also easily search all the questions, and see if anybody has asked a similar question before (and what the answers were). The answers are ranked based on user voting, and the questions are sorted by tags. People are motivated to give good answers, since they get ``karma points'' for posting useful answers. One of the first big question/answer sites is \url{http://stackoverflow.com/}, which has a huge number of questions and answers about all things related to coding. One of the top most popular tags is ``python'', with over 50000 question. There are also a few dozen questions taged ``sage'' (some about the Sage math software, and some about the unrelated Sage accounting software). If you run into Python programming questions, this can be an excellent site on which to search for answers or ask questions. \section{Documentation for components of Sage} There are many components of Sage that offer vast amounts of functionality, and have excellent documentation, but you'll find almost nothing about them in any of the standard Sage documentation. For example, for numerical computing numpy, scipy, and cvxopt are all included in Sage, and often many, many capabilities that are well documented in their respected documentation. Thus it is quite useful to know that you can do the following: %listing \begin{lstlisting} sage: import scipy.special sage: scipy.special.The goal of this lecture is to give you a deeper understanding of some of the fundamental and unique architectural issues involved in Sage.

I built Sage partly from other complete mathematical software systems because I wanted to finish Sage (for my use!) in at most 5 years.

Some of the major components included in Sage are:

- PARI/GP - number theory
- GAP - group theory
- Singular - commutative algebra
- Maxima - symbolic calculus
- R - statistics

Each of the above is a full standalone project with its own custom programming language, history, culture, etc. And each has unique, powerful, debugged code that I don't want to have to rewrite from scratch, since it would take too long, and writing code from scratch is incredibly difficult and frustrating.

I also wanted to make it easy to call the following systems from Sage for the purposes of benchmarking, porting, migration of users and code, optional functionality, etc.:

- Magma
- Maple
- Mathematica
- Mupad
- Matlab
- Axiom, Octave, REDUCE, Macaulay2, Scilab, Kash, Lisp

**The Big Problem:** How can we make use of the above systems from Python?

This question is difficult partly because there are so many answers, each with pros and cons, and I had to choose (then be criticized for my choices).

sage: number_of_partitions(10) 42 sage: list(Partitions(10)) [[10], [9, 1], [8, 2], [8, 1, 1], [7, 3], [7, 2, 1], [7, 1, 1, 1], [6, 4], [6, 3, 1], [6, 2, 2], [6, 2, 1, 1], [6, 1, 1, 1, 1], [5, 5], [5, 4, 1], [5, 3, 2], [5, 3, 1, 1], [5, 2, 2, 1], [5, 2, 1, 1, 1], [5, 1, 1, 1, 1, 1], [4, 4, 2], [4, 4, 1, 1], [4, 3, 3], [4, 3, 2, 1], [4, 3, 1, 1, 1], [4, 2, 2, 2], [4, 2, 2, 1, 1], [4, 2, 1, 1, 1, 1], [4, 1, 1, 1, 1, 1, 1], [3, 3, 3, 1], [3, 3, 2, 2], [3, 3, 2, 1, 1], [3, 3, 1, 1, 1, 1], [3, 2, 2, 2, 1], [3, 2, 2, 1, 1, 1], [3, 2, 1, 1, 1, 1, 1], [3, 1, 1, 1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [2, 2, 2, 2, 1, 1], [2, 2, 2, 1, 1, 1, 1], [2, 2, 1, 1, 1, 1, 1, 1], [2, 1, 1, 1, 1, 1, 1, 1, 1], [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]] sage: time number_of_partitions(10^7) 92027175502604546685596278166825605430729405281023979395328576351741298526232350197882291654710333933219876431112892669442996519201446933718057425885425510196566971369272243936886123704944390011846626724222935883880949646021554674211449712293631879438242092222979701858787035045131791561718499094276679781015502944193307504577212918898104161448934354538420643899518683659226259312517022343012768006249066347774384224200200491423135789948628712467862610060006610227873354093344771970346402912468019502617741296485750068965727678736574879683519236357061319134860914524427627076446580477740857594944050855144756641881148963046419111504530928013165254773178279374714115048498031936743061146399094602347281946619566715867818368113040887581799683872172944577575391666322871295451048112070492385908727524159239222366587691028630013147462129464573569940173628469758175515190001641345140899367093190859080267185611792170429465197857968117435299141079155705662249617336911859509285578584413441733816964914269258918743530174261522145886491433067881470395832647408578195566042566460494284912372612048505127243987254597660690323804252237148083121685300106473794690689801747504661946299472005981442948094926764563853171727831538610914056742150737497538427502124024964209080033657278768694168219946346309906686670220404292685540282106667661584147201155740243521818020629234011924167214100674819982674459329845176126920689181532303574746823885977198276651255478126323692431978852519785180336170541394768609366389114772406136683349412746708709224596935940110215929918473383014389065632764805267657590865513593978244244311680854217735536592823219001199569770431572718888108452817614295777255622715362408122265800848932276196267545354426556723347685735897551768373108977390874033016418618266068861412680046926417299574367992118065543824471964711585682146618651730116890773279227432137954488457335074000412015820488115033194548391813526724286716271750346470118187739995800431441608729698516332428256345862484406623639956273335472545741134762782834725659303362718667415197368798040219223371637134331256426629887430519590196027703231191899072357523708159719377437782971846471928181434133277347743728346062720786947891744911516248998265758156411629802345973249527007535170095624359891373084013345497111826508558570667664488643633183476426421938705419176750248730025527079093923132020478303851642054200888831310378483811088425548758508373324093873479730273437693679342081086702018520971877967092726990888474755041255077282275352308667980953624552474645295774554823817830271752015905175487656200680986962732503598211618667111264793583287147032328796244469851598884862190118775508811358918545701650113647140734902415801399931326365622784208807398682277421689531141855076998289838068746126554377321750652288128533274971246389513818051037817849704769088911818484659276794986806465634286299339344423718490667472287384981129496643999747556606122301402027411096367172771385437211572122448922846957076711812454645572143320128971630906382167345381586565940316748316082706877584540948496188398233399350019423833682423300814059493159013417591894979385650634324308841947277121816559279335938926911510683549043290690287102733571315222761846482615431786061813454633634415974179413942024706129986725734471552346167738613509470760758338637870579921007168514417341548481513953296373455058614174692678013759737246724696931125240457406888289154055030387593548942880549262383621259594080699698643245355453826567378500963781681659096276126857969078217677288980 Time: CPU 0.32 s, Wall: 0.32 s

**Problem 1:** Availability of a specific known version of a third party software package.

Even if we solve the big problem above, a "vendor" will often just release a new version of their software with numerous changes that break our solution. This happens *constantly*. And the actual versions of software that people have installed (under OS X, various Linuxes, Solaris, etc.) will be widely distributed over versions of software from the last decade.

**Solution: **For the free open systems that (1) we really need, and (2) we can build from source easily enough, we ship and build a very specific version as part of Sage. This completely solves problem 1, at the expense of a lot of (misplaced) criticism from people who don't understand the problem; at the same time, this accidentally creates a solution to a different problem (easy-to-install distribution of a bunch of useful math software), which many people greatly appreciate.

For the non-free systems or the free systems that are hard to build, the problem just doesn't get solved. And indeed our interfaces and code that relies on those systems is sadly fairly brittle; often a new version of Magma just breaks with Sage. Fortunately, of course very little functionality in Sage depends on such systems.

**Problem 2**: Make a specific version of some mathematics software (call it M) usable from Python.

Here are some of the many potential approaches to this problem:

**Naive Subprocess.**Start up M, tell it to read in a file, and save the results in a file, and terminate.*This doesn't preserve state between calls, and startup time can easily be seconds, so this is not viable.***Create network protocols.**Define an openmath/XML based protocol for well-defined communication of "arbitrary mathematics" (whatever that is) between software, e.g., between M and Python. Design and implement client and server protocols. The SCIEnce project, started in 2006, and costing many millions of dollars, is an example.*This seems like the right approach, but it it is slow in certain relevant benchmarks and too challenging (for me) to develop. It's probably very useful for something (e.g., writing research papers), but is massively too complicated for what we need for Sage, which is focused on what is practical now.***Pseudo-tty's (ptty) = pexpect.**Create a simulated command line prompt that is controled by Python. Absolutely anything that one can do at the command line with M immediately becomes usable from a Python program. This is relatively easy to implement and extremely flexible -- one can create a useful interface to any math software system out there in a day. It is slow in some sense, but still much better than (1) and (2).*This approach has been used in Sage for a long time.***C/C++ library interfaces.**Create a C/C++ library interface and link the other program into Python itself, using Cython. This is extremely difficult, because none of the M's (except PARI) are designed to be used this way. However, it is extremely fast. For basic arithmetic, it can be several hundred times faster than (3) above.

As of now, people have written fairly polished versions of both (3) and (4) for all of: PARI, GAP, Singular, Maxima, and R. In case of (4), these are all hard work, and aren't necessarily used much in Sage yet, or even included in Sage, but they exist, and are on the way in.

The rest of this worksheet is about how to use (3) above: the pexpect based interfaces. This is *well worth learning*, because these interface all work in almost exactly the same way, and there are interfaces to pretty much every math software system out there. Sage is the only software in existence that can talk to so many other math software systems.

Here are the basic points, which we'll follow with several examples illustrating them. Suppose m is one of math software systems, e.g., r or singular or maxima:

**x = m.eval(s):**sets x equal to the string obtained by typing the string s into M. The string s can be several lines long. (This is how many % modes in the noteboook are implemented.)**x = m(s):**creates a new Python object that "wraps" the result of evaluating s in M. It's like typing "x.name() = s" into M. You can then do arithmetic with x, and call functions on it, e.g.,**x.function(...)**or**m.function(..., x, ...)**.

And that is pretty much it.

**WARNING:** There is latency. **Any** time you call any function involving a pexpect interface, expect it to take on the order of at least **1ms (=one millisecond), **even if the actual operation in the system M takes almost no time. For comparison, adding or multiplying most simple objects in Python/Sage takes about 1 microsecond (i.e., 1/1000 the time of a call involving pexpect), and adding/multiping objects in Cython can take only a few nanoseconds (1/1000000 the time of a pexpect call).

Another note: the very first time you do m.eval(...) it may take surprisingly long, since another program is starting up.

We use Maxima to illustrate evaluation of a simple string:

sage: s = maxima.eval("2 + 3") sage: type(s)There is now a separate Maxima subprocess running. Each process has an id number associated to it:

sage: maxima.pid() # the "pin id" of the subprocess 9259Next will illustrate creating a Python object that wraps an expression in Maxima.

sage: s = maxima('sin(x^3) * tan(y)') sage: type(s)The name of the object in the corresponding Maxima session:

sage: s.name() 'sage2656'The object prints nicely:

sage: s sin(x^3)*tan(y)Latex output happens to be supported:

sage: show(s)\newcommand{\Bold}[1]{\mathbf{#1}}\sin x^3\,\tan y

sage: maxima.eval('sage2656 + 1')
'sin(x^3)*tan(y)+1'
You can call functions on objects in a Pythonic way.

sage: s.integrate('y') sin(x^3)*log(sec(y))Or use maxima.function(...)

sage: maxima.integrate(s, 'y') sin(x^3)*log(sec(y))The result is another Python object (which wraps another object defined in Maxima). We can call functions on that object as well.

sage: z = s.integrate('y') sage: type(z)**Conclusion:** If you understand the above, you are in extremely good shape. All the other interfaces work the same way. The examples below are just to illustrate some subtle points and show how interfaces are useful.

It is possible in some systems to seriously mess things up and get things "out of sync". This is nearly impossible with Maxima, since we use it so heavily and have debugged the heck out of it. However, with other systems (like Magma) this can happen. If it does, do, e.g., **maxima.quit()**. This completely kills the subprocess, invalides any Python objects that wrap variables in that session, and starts a brand new fresh session.

Here is an example with each of the five big systems included in Sage:

sage: maxima('2+3') # maxima 5 sage: gp('2+3') # pari/gp 5 sage: singular('2+3') 5 sage: gap('2+3') 5 sage: r('2 + 3') [1] 5 sage: z_sage._maxima_init_() '(log(sec(y)))*(sin((x)^(3)))'You can follow standard R tutorials and have the computations (except graphics at present) to all definitely "just work". (Unlike the potentially confusing rpy2.)

sage: x = r('c(1,3,2,10,5)'); y = r('1:5') sage: print x sage: print y [1] 1 3 2 10 5 [1] 1 2 3 4 5 sage: x + y [1] 2 5 5 14 10 sage: x/y [1] 1.0000000 1.5000000 0.6666667 2.5000000 1.0000000 sage: x.length() [1] 5 sage: x > 3 [1] FALSE FALSE FALSE TRUE TRUE sage: x[x > 3] [1] 10 5There is also an interface to Octave, which is very similar to Matlab (but free).

sage: A = octave('rand(3)'); A 0.401446 0.286955 0.396858 0.606625 0.371021 0.515619 0.96863 0.683554 0.837288 sage: A*A 0.719642 0.492938 0.639562 0.968042 0.664185 0.863772 1.61454 1.1039 1.43791 sage: A.rref() 1 0 0 0 1 0 0 0 1**Bonus:** There is even a pexpect interface to Sage itself. (Trivia: this is used in the implementation of the Sage notebook.)

Let's get crazy: a pexpect interface inside a pexpect interface. And of course, this code is going from the notebook to Sage via yet another interface.

sage: sage0.eval('sage0 = Sage()') sage: z = sage0('sage0("3+5")') sage: type(z)Sage has plotting support that covers:

- most 2d plotting that Mathematica has (with a similar interface)
- 3d plotting, somewhat like Mathematica
- most 2d plotting that Matlab has (with a similar interface)

Sage uses the Python library Matplotlib (http://matplotlib.sourceforge.net/) is used under the hood to render 2d graphics; for 3d graphics, Sage can use a Java applet (jmol), an HTML5 canvas renderer, or a raytracer.

In this worksheet, we'll explain how to use the "mathematica-style" 2d plotting capabilities of Sage.

First, we'll discuss a simple but very powerful plotting command in Sage called "line". It takes as input a list of points, and draws a sequence of line segments connecting them. The points are given as 2-tuples (x,y), which are the x and y coordinates of a point in the plane. The output of calling the line command is a line object.

sage: L = line([(-2,-2), (3,8), (5, 5)]) sage: print L Graphics object consisting of 1 graphics primitiveTo * see* the actual plot of L, just put L by itself on a line or type show(L) or L.show():

Incidentally, there are many, many options that you can pass to the show command. The three most useful are:

- frame=True: Make it so the x-y axis are replaced by a frame, which is much better when looking at certain types of plots
- gridlines=True: Adds a background grid, which makes it easier to understand the plot in some cases.
- figsize=[w,h]: Allows you to adjust the size of the output. Think of w and h as the width and height in "inches".

You can combine these options. For example:

sage: L.show(frame=True, gridlines=True, figsize=[8,2])In the notebook you can just click and download the default plots displayed above, since they are png images. However, if you want to include images in a paper you're writing, or use an image in an editor such as Inkscape, it's much better to save the images in other formats. Thus a critically useful command is L.save('filename.ext'), which enables you to save a graphics object to a file. The extension of the filename determines the type of the file. For example, below we save L to pdf, eps, and svg formats. Note that the svg image just appears embedded in your web browser, and you can pan around. In any case, you can always (right or control) click on the link or image to save it as a file on your computer.

sage: L.save('image.pdf') sage: L.save('image.eps') sage: L.save('image.svg')Lines (and all other graphics objects) have numerous properties that you can adjust, which you find in the documentation. The most important properties of lines are:

- color=...: where for the color you can give a string, e.g., 'red'; or an html color, e.g., '#042a99', or an rgb triple.
- thickness=4: the thickness of the line
- linestyle='--': the style of the line: '--', '-.', '-', ':'

Let's have some fun:

sage: line([(random(), random()) for _ in range(100)], color='purple')Arithmetic: a key unusual idea in Sage graphics is that you combine together different graphics using +, as illustrated below:

sage: L1 = line([(0,0), (1,1), (2,0)], color='green', thickness=7) sage: L2 = line([(1,0), (2,5), (3,0)], color='purple', thickness=10, alpha=.7) # alpha = transparency sage: L1 + L2There are numerous other important plotting commands in Sage, including point, circle, polygon, arrow, and text, as illustrated below:

sage: G = point((1,1), pointsize=200) + circle((1,1), .5) sage: # zorder makes sure that triangle is on top sage: G += polygon([(0,0), (1,.6), (2,0)], color='purple', zorder=5) sage: G += arrow((1,1), (2,1.2), color='green') sage: # You can use TeX formulas: sage: G += text(r"$\sqrt{\sin(\pi x^2)}$",(1.8,1.35),color='black',fontsize=20) sage: G.show(aspect_ratio = 1)There are also a function just called "plot" that makes a plot of a wide range of Sage objects. It is very useful especially for plotting functions of one variable. It is probably the most used Sage plotting function. The result is a graphics object, which you can use just like any graphics object discussed above.

sage: plot(x*sin(1/x), (x, -1, 5), color='green', thickness=2)matrix_plot is another similar plotting function, which allows you to visualize a matrix.

sage: A = random_matrix(RDF,100); sage: matrix_plot(A) sage: matrix_plot(A^2)Finally, there is a graphics_array function that lets you assemble several independent plots into a single big plot.

sage: graphics_array([[matrix_plot(A), matrix_plot(A^2)], [plot(sin), plot(cos,color='red')]])Bonus -- you can animate graphics. Given any list of graphics objects, the animate command will make a single animated GIF file out of them. For example:

sage: v = [plot(sin(a*x), (x,0,10)) for a in [0,0.2,..,pi]] sage: z = animate(v, xmin=0,xmax=10,ymin=-1,ymax=1) sage: z.show(delay=10) \end{verbatim} \section{3d Plots} \begin{verbatim}In Sage, just as with 2d graphics, you make 3d graphics by creating various primitives and combining them together using addition to create a 3d scene.

There are many 3d graphics primitives in Sage. For example, you can draw platonic solids using **tetrahedron, cube, octahedron, dodecahedron, icosahedron**. You can plot round objects using **sphere** and **point3d**. You can plot 1d arrows, lines, and curves in space using **arrow3d, bezier3d, line3d**. There are also numerous powerful tools for plotting data, functions, and surfaces in 3d, including** cylindrical_plot3d, implicit_plot3d, list_plot3d, parametric_plot3d, plot3d, plot_vector_field3d, polygon3d, revolution_plot3d, spherical_plot3d**, and **implicit_plot3d**. Finally, you can place text in 3d using the **text3d** function.

All 3d graphics objects have **translate** and **rotate** (and **rotateX, rotateY, rotateZ**) methods, which allow you to position the object or collection of objects anywhere you want.

Also, you can set the color and opacity of any 3d object when you create it, as an optional parameter.

Finally, you can display a 3d scene G using either jmol (java) via **G.show()**, the Tachyon raytracer via **G.show(viewer='tachyon')**, or HTML5 canvas via **G.show(viewer='canvas3d')**. The show command also takes an aspect_ratio option; e.g., sometimes **aspect_ratio=1** is useful, in order to make sphere round, etc.

There are also some rudimentary 3d plotting capabilities in matplotlib. I had once announced an intention to improve those for Sage, but upon closer inspection one finds that matplotlib is very 2d oriented and the 3d stuff just doesn't feel right at all.

**History:** William Stein included Tachyon in Sage, then Tom Boothby, Josh Kantor and William Stein wrote some very preliminary 3d plotting functionality that relied entirely on tachyon, and could plot functions $z=f(x,y)$. A year later, during Christmas break, William stumbled on the jmol Java viewer (that only uses Java's 2d render!) for molecules and he and Robert Bradshaw snuck off and figured out how to make jmol show more general mathematical graphics, and also wrote most of the 3d plotting library on top of this, motivated by the upcoming joint math meetings in San Diego (Jan 2008). David Joyner then submitted many examples to the documentation. Next, William Cauchois (as a UW freshman project) and Carl Witty added an implicit_plot3d function, and Cauchois also added HTML5 canvas rendering. Other people added plotting of vector fields, cylindrical plotting, etc., driven by Calculus teaching needs.

**Note: **The 3d plotting in Sage is mainly oriented toward mathematical visualization, rather than visualizing large 3d datasets that come up in Scientific computing. Scientsts are rumored to make great use of other Python-friendly options, none of which are included with Sage or are easy to install in Sage at present, though all are free, open source, and can be installed if one is *"sufficiently motivated"*: MyaVI (which uses the VTK C++ library), ScientificPython, ...?

**Shortcoming:** The biggest shortcomings are that (1) realtime interaction with 3d graphics is not supported in any way, (2) there is no easy way to make high quality movie animations of 3d scenes (it is possible, but requires optional tools), and (3) your browser can run out of Java memory if you display too many jmol java-based 3d plots at once, and refuse to display anymore -- this has been fixed in a patch that has gone into Sage yet.

The rest of this worksheet illustrates with examples how to create 3d images using Sage.

**Problem**: Draw all of the platonic solids next to each other in different colors in a single plot.

**Problem: **Plot 40 semi-transparent random spheres. Similarly, plot a few hundred random points.

Draw a 3d random walk.

sage: v = [(0,0,0)] sage: for i in range(300): ... v.append([a+random()-.5 for a in v[-1]]) ... sage: line3d(v, color='black') sage: line3d(v, color='red', thickness=3)**Problem:** Draw some text in 3d.

**Problem**: Plot a function $z=f(x,y)$.

**Problem: **Plot an implicit 3d surface defined by an equation $f(x,y,z)=0$.

**Problem:** Plot Yoda.

**Solution:** use a standard mesh one finds online as follows, which describes a model of Yoda that has over 50,000 triangles. Here we use the scipy module "io" to load the model, then use the IndexFaceSet 3d primitive to construct the 3d image from the triangulation data.

Though Sage provides its own functions (e.g,. plot, line, point, text, circle, etc.) for drawing 2d graphics, they are all very oriented toward visualizing the sorts of mathematical objects that come up in more pure mathematics (so more like Mathematica). For the sort of scientific visualizing that comes up in applications, the matplotlib library provides functionality that is very similar to Matlab for plotting. Also, matplotlib can be used on any major operating system without using Sage; it only depends on numpy, and has a very open license (BSD-compatible).

Also, if you're drawing an image that involves a huge amount of data points, directly using matplotlib can be *more efficient* than using Sage's plotting, since Sage's plotting is built on top of matplotlib -- using matplotlib directly gets you closer to the metal.

There are** two absolutely critical** things to remember when using matplotlib from Sage:

- Instead of
**plt.show()**use**plt.savefig('a.png'). Memorize this now.**This will make a nice smooth antialised png image of the plot appear in the Sage notebook. Using plt.show() may just do nothing in Sage, depending on your setup (it might also popup a window). You can also do**plt.savefig('a.pdf')**and**plt.savefig('a.svg')**. - You might have to put your input in a
**%python**cell or turn off the preparser (by typing**preparser(False)**).

With these two hints, you should be able to to try out the examples at http://matplotlib.sourceforge.net/gallery.html.

In fact, try it now [in class, go to the above website, scroll, and let students choose an example]:

- Click on the thumbnail image.
- Click source code in the upper left
- Paste the code into a notebook cell.
- Put
**%python**as the first line of the cell. - Change any .show() to .savefig('a.png')

Note: There are some images in the gallery that require some external data file (e.g, the brain image), so those won't work.

For example, if students choose the first example, we get:

sage: %hide sage: %python sage: """ sage: Show examples of matplotlib artists sage: http://matplotlib.sourceforge.net/api/artist_api.html sage: Several examples of standard matplotlib graphics primitives (artists) sage: are drawn using matplotlib API. Full list of artists and the sage: documentation is available at sage: http://matplotlib.sourceforge.net/api/artist_api.html sage: Copyright (c) 2010, Bartosz Telenczuk sage: License: This work is licensed under the BSD. A copy should be sage: included with this source code, and is also available at sage: http://www.opensource.org/licenses/bsd-license.php sage: """ sage: import numpy as np sage: import matplotlib.pyplot as plt sage: import matplotlib sage: from matplotlib.collections import PatchCollection sage: import matplotlib.path as mpath sage: import matplotlib.patches as mpatches sage: import matplotlib.lines as mlines sage: font = "sans-serif" sage: fig = plt.figure(figsize=(5,5)) sage: ax = plt.axes([0,0,1,1]) sage: # create 3x3 grid to plot the artists sage: pos = np.mgrid[0.2:0.8:3j, 0.2:0.8:3j].reshape(2, -1) sage: patches = [] sage: # add a circle sage: art = mpatches.Circle(pos[:,0], 0.1,ec="none") sage: patches.append(art) sage: plt.text(pos[0,0], pos[1,0]-0.15, "Circle", ha="center", ... family=font, size=14) ... sage: # add a rectangle sage: art = mpatches.Rectangle(pos[:,1] - np.array([0.025, 0.05]), 0.05, 0.1, ... ec="none") ... sage: patches.append(art) sage: plt.text(pos[0,1], pos[1,1]-0.15, "Rectangle", ha="center", ... family=font, size=14) ... sage: # add a wedge sage: wedge = mpatches.Wedge(pos[:,2], 0.1, 30, 270, ec="none") sage: patches.append(wedge) sage: plt.text(pos[0,2], pos[1,2]-0.15, "Wedge", ha="center", ... family=font, size=14) ... sage: # add a Polygon sage: polygon = mpatches.RegularPolygon(pos[:,3], 5, 0.1) sage: patches.append(polygon) sage: plt.text(pos[0,3], pos[1,3]-0.15, "Polygon", ha="center", ... family=font, size=14) ... sage: #add an ellipse sage: ellipse = mpatches.Ellipse(pos[:,4], 0.2, 0.1) sage: patches.append(ellipse) sage: plt.text(pos[0,4], pos[1,4]-0.15, "Ellipse", ha="center", ... family=font, size=14) ... sage: #add an arrow sage: arrow = mpatches.Arrow(pos[0,5]-0.05, pos[1,5]-0.05, 0.1, 0.1, width=0.1) sage: patches.append(arrow) sage: plt.text(pos[0,5], pos[1,5]-0.15, "Arrow", ha="center", ... family=font, size=14) ... sage: # add a path patch sage: Path = mpath.Path sage: verts = np.array([ ... (0.158, -0.257), ... (0.035, -0.11), ... (-0.175, 0.20), ... (0.0375, 0.20), ... (0.085, 0.115), ... (0.22, 0.32), ... (0.3, 0.005), ... (0.20, -0.05), ... (0.158, -0.257), ... ]) ... sage: verts = verts-verts.mean(0) sage: codes = [Path.MOVETO, ... Path.CURVE4, Path.CURVE4, Path.CURVE4, Path.LINETO, ... Path.CURVE4, Path.CURVE4, Path.CURVE4, Path.CLOSEPOLY] ... sage: path = mpath.Path(verts/2.5+pos[:,6], codes) sage: patch = mpatches.PathPatch(path) sage: patches.append(patch) sage: plt.text(pos[0,6], pos[1,6]-0.15, "PathPatch", ha="center", ... family=font, size=14) ... sage: # add a fancy box sage: fancybox = mpatches.FancyBboxPatch( ... pos[:,7]-np.array([0.025, 0.05]), 0.05, 0.1, ... boxstyle=mpatches.BoxStyle("Round", pad=0.02)) ... sage: patches.append(fancybox) sage: plt.text(pos[0,7], pos[1,7]-0.15, "FancyBoxPatch", ha="center", ... family=font, size=14) ... sage: # add a line sage: x,y = np.array([[-0.06, 0.0, 0.1], [0.05,-0.05, 0.05]]) sage: line = mlines.Line2D(x+pos[0,8], y+pos[1,8], lw=5., ... alpha=0.4) ... sage: plt.text(pos[0,8], pos[1,8]-0.15, "Line2D", ha="center", ... family=font, size=14) ... sage: colors = 100*np.random.rand(len(patches)) sage: collection = PatchCollection(patches, cmap=matplotlib.cm.jet, alpha=0.4) sage: collection.set_array(np.array(colors)) sage: ax.add_collection(collection) sage: ax.add_line(line) sage: ax.set_xticks([]) sage: ax.set_yticks([]) sage: plt.savefig('a.png')Matplotlib has an interface that works much like Matlab. This will be very helpful if you know Matlab, and of some value otherwise since there is a lot of Matlab code and documentation out there. This mode is called "pyplot", and there is a now tutorial for it at http://matplotlib.sourceforge.net/users/pyplot_tutorial.html.

Below we replicate several examples from this tutorial in Sage, and you should read this tutorial. The main point you should realize when looking at these examples is how easily one can express scientific data visualization in terms of this interface; many equivalent plots are possibly directly with Sage's plotting commands, but they are less natural.

sage: import matplotlib.pyplot as plt sage: plt.clf() sage: plt.plot([1,2,3,4]) sage: plt.ylabel('some numbers') sage: plt.savefig('a.png', dpi=70) sage: plt.clf() sage: plt.plot([1,2,3,4], [1,4,9,16]) sage: plt.savefig('a.png', dpi=70) sage: plt.clf() sage: # 'ro' = red circles, like in MATLAB; 'bx' = blue crosses. sage: plt.plot([1,2,3,4], [1,4,9,16], 'ro', [5,5.5], [2,2], 'bx') sage: plt.axis([0, 6, 0, 20]) sage: plt.savefig('a.png', dpi=70)Multiple figures and axis all at once:

sage: import numpy as np sage: import matplotlib.pyplot as plt sage: def f(t): ... return np.exp(-t) * np.cos(2*np.pi*t) ... sage: t1 = np.arange(0.0, 5.0, 0.1) sage: t2 = np.arange(0.0, 5.0, 0.02) sage: plt.clf() sage: plt.figure(1) sage: plt.subplot(121) sage: plt.plot(t1, f(t1), 'bo', t2, f(t2), 'k') sage: plt.subplot(122) sage: plt.plot(t2, np.cos(2*np.pi*t2), 'r--') sage: plt.savefig('a.png')An example involving text

sage: import numpy as np sage: import matplotlib.pyplot as plt sage: plt.clf() sage: mu, sigma = 100, 15 sage: x = mu + sigma * np.random.randn(10000) sage: # the histogram of the data sage: n, bins, patches = plt.hist(x, 50, normed=1, facecolor='g', alpha=0.75) sage: plt.xlabel('Smarts') sage: plt.ylabel('Probability') # bug -- gets chopped out below :-( sage: plt.title('Histogram of IQ') sage: plt.text(60, .025, r'$\mu=100,\ \sigma=15$') sage: plt.axis([40, 160, 0, 0.03]) sage: plt.grid(True) sage: plt.savefig('a.png', dpi=70)Incidentally, you can of course combine matplotlib graphics with @interact

sage: import numpy as np sage: import matplotlib.pyplot as plt sage: plt.clf() sage: mu, sigma = 100, 15 sage: x = mu + sigma * np.random.randn(10000) sage: @interact sage: def f(bins=(5..150)): ... plt.clf() ... n, bins, patches = plt.hist(x, bins, normed=1, facecolor='g', alpha=0.75) ... plt.xlabel('Smarts', fontsize=18, color='red') ... plt.ylabel('Probability') # bug -- gets chopped out below ... plt.title('Histogram of IQ') ... plt.text(60, .025, r'$\mu=100,\ \sigma=15$') # latex! ... plt.axis([40, 160, 0, 0.03]) ... plt.grid(True) ... plt.savefig('a.png', dpi=70)Annotation Example

sage: import numpy as np sage: import matplotlib.pyplot as plt sage: plt.clf() sage: ax = plt.subplot(111) sage: t = np.arange(0.0, 5.0, 0.01) sage: s = np.cos(2*np.pi*t) sage: line, = plt.plot(t, s, lw=2) sage: plt.annotate('local max', xy=(2, 1), xytext=(3, 1.5), ... arrowprops=dict(facecolor='black', shrink=0.07)) ... sage: plt.ylim(-2,2) sage: plt.savefig('a.png')There are tons of other examples of pyplot at the matplotlib website here: http://matplotlib.sourceforge.net/examples/pylab_examples/

For example we have the following economics example:

sage: """ sage: make a scatter plot with varying color and size arguments sage: """ sage: import matplotlib sage: import numpy as np sage: import matplotlib.pyplot as plt sage: import matplotlib.mlab as mlab sage: import matplotlib.cbook as cbook sage: # load a numpy record array from yahoo csv data with fields date, sage: # open, close, volume, adj_close from the mpl-data/example directory. sage: # The record array stores python datetime.date as an object array in sage: # the date column sage: datafile = cbook.get_sample_data('goog.npy') sage: r = np.load(datafile).view(np.recarray) sage: r = r[-250:] # get the most recent 250 trading days sage: delta1 = np.diff(r.adj_close)/r.adj_close[:-1] sage: # size in points ^2 sage: volume = (15*r.volume[:-2]/r.volume[0])**2 sage: close = 0.003*r.close[:-2]/0.003*r.open[:-2] sage: fig = plt.figure() sage: ax = fig.add_subplot(111) sage: ax.scatter(delta1[:-1], delta1[1:], c=close, s=volume, alpha=0.75) sage: #ticks = arange(-0.06, 0.061, 0.02) sage: #xticks(ticks) sage: #yticks(ticks) sage: ax.set_xlabel(r'$\Delta_i$', fontsize=20) sage: ax.set_ylabel(r'$\Delta_{i+1}$', fontsize=20) sage: ax.set_title('Volume and percent change') sage: ax.grid(True) sage: plt.savefig('a.png')There is more to matplotlib than just a Matlab like interface. Matplotlib has its own library interface, primitives, etc., which are documented here: http://matplotlib.sourceforge.net/contents.html. Also, there is an strong community with much momentum behind matplotlib development. It is the *de facto* standard for 2d plotting in Python, and it keeps getting better.

This is basically this example: http://matplotlib.sourceforge.net/examples/mplot3d/surface3d_demo.html

sage: %python sage: from mpl_toolkits.mplot3d import Axes3D sage: from matplotlib import cm sage: from matplotlib.ticker import LinearLocator, FixedLocator, FormatStrFormatter sage: import matplotlib.pyplot as plt sage: import numpy as np sage: fig = plt.figure() sage: ax = fig.gca(projection='3d') sage: X = np.arange(-7, 7, 0.25) sage: Y = np.arange(-7, 7, 0.25) sage: X, Y = np.meshgrid(X, Y) sage: R = np.sqrt(X**2 + Y**2) sage: Z = np.sin(R) sage: surf = ax.plot_surface(X, Y, Z, rstride=1, cstride=1, cmap=cm.jet, ... linewidth=0, antialiased=False) ... sage: ax.set_zlim3d(-1.01, 1.01) sage: ax.w_zaxis.set_major_locator(LinearLocator(10)) sage: ax.w_zaxis.set_major_formatter(FormatStrFormatter('%.03f')) sage: fig.colorbar(surf, shrink=0.5, aspect=5) sage: plt.savefig('a.png') \end{verbatim} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \chapter{Number Theory} \section{Prime Numbers and the Riemann Hypothesis} \subsection{Primes} An integer $p\geq 2$ is {\em prime} if its only divisors are $1$ and $p$. For example, the first few primes are $$ 2, 3, 5, 7, 11, 13, 17, 19, \ldots. $$ You can find primes in Sage using the \verb|prime_range| command: \begin{lstlisting} sage: prime_range(10) [2, 3, 5, 7] sage: prime_range(7, 23) [7, 11, 13, 17, 19] sage: range(7, 23) [7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22] sage: prime_range(100) [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97] \end{lstlisting} Note that \verb|prime_range| works like the range command, in that it doesn't include its upper endpoint. There is also an iterator over the prime numbers called \verb|primes|: \begin{lstlisting} sage: P = primes(10^100); PNaive modular exponentiation is not good.

sage: 7^11 1977326743 sage: (7^11) % 13 2But Sage implements a vastly better algorithm.

sage: a = Mod(18, 11); a 7 sage: type(a)Here is a bigger example.

sage: p = next_prime(10^100); p 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000267 sage: g = Mod(2, p) sage: a = ZZ.random_element(p); a 3899462984078586138445766211121799052200774540320148812825084038333387229965957683578348930338929160 sage: g^a 7947388754511516576098430442932357966776257289140614732345836374084514192683933294095474801360397543 sage: timeit('g^a') 625 loops, best of 3: 71.1 µs per loopWe illustrate the Diffie-Hellman key exchange.

sage: @interact sage: def _(bits=(5..1024), g=2, seed=(0..100)): ... t = cputime() ... set_random_seed(seed) ... p = next_prime(2^(bits-1)) ... print "" ... print "p = %s"%p ... a = ZZ.random_element(p) ... b = ZZ.random_element(p) ... print "a = %s"%a ... print "b = %s"%b ... g = Mod(g, p) ... print "g^a (mod p) = %s"%(g^a) ... print "g^b (mod p) = %s"%(g^b) ... print "secret = %s = %s"%((g^a)^b, (g^b)^a) ... print "total time = %s seconds"%cputime(t) ... print "" sage: time next_probable_prime (2^(1024-1)) 89884656743115795386465259539451236680898848947115328636715040578866337902750481566354238661203768010560056939935696678829394884407208311246423715319737062188883946712432742638151109800623047059726541476042502884419075341171231440736956555270413618581675255342293149119973622969239858152417678164812112069763 Time: CPU 0.21 s, Wall: 0.21 sReferences:

- For math -- see http://wstein.org/ent (chapter 3).
- More on cryptography using Sage -- see the book by David Kohel that is listed here.
- There is a library called PyCrypto that is included with Sage.

See http://rpy.sourceforge.net/rpy2/doc-2.0/html/introduction.html.

sage: %auto sage: import rpy2.robjects as robjects sage: R = robjects.rWe get pi from the R namespace.

sage: v = R['pi']; vNote that we have to explicitly use print to see a nice representation:

sage: print v [1] 3.141593There is a pexpect interface to r called "r" by default when you start Sage. This tutorial is not about that interface, but instead about the C library interface called rpy2, which is much faster and more robust.

sage: r R Interpreter sage: r('2 + 3') # the pexpect interface [1] 5 sage: import rpy2.robjects as robjects sage: R = robjects.r sage: print R('2 + 3') # the rpy2 cython interface (note the import!) [1] 5 sage: R(""" sage: a = 5 sage: b = 7 sage: c = a + b""") sage: print R("c") [1] 12 sage: timeit("r('2+3')") 625 loops, best of 3: 1.44 ms per loop sage: timeit("R('2+3')") 625 loops, best of 3: 650 µs per loop sage: timeit("pari('2+3')") 625 loops, best of 3: 5.72 µs per loop(frankly, I'm shocked at how slow the rpy2 interface actually is...!)

This is how to get started with rpy2:

Beware the preparser:

sage: v = R['pi']; vAnd note again that v is a vector not a number.

sage: print v + int(3) [1] 3.141593 3.000000 sage: print v[0] + int(3) 6.14159265359WARNING: Python indexing starts at 0 and R indexing starts at 1.

sage: print R('c(5,2,-3)[1]') [1] 5 sage: timeit('R("f <- function(r) { 2 * pi * r }")') 625 loops, best of 3: 460 µs per loopDefine a function in R:

sage: R("f <- function(r) { 2 * pi * r }")Now call the function:

sage: print R("f(3)") [1] 18.84956The function is now defined in the global R namespace:

sage: r_f = R['f'] sage: print r_f(int(3)) [1] 18.84956 sage: timeit('r_f(int(3))') 625 loops, best of 3: 41.4 µs per loop sage: print R("f") function(r) { 2 * pi * r }Most R objects have a string representation that can be directly parsed by R, which can be handy.

sage: letters = R['letters'] sage: print letters.r_repr() c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z")Here is an example of how we might use this:

sage: rcode = 'paste(%s, collapse="-")' %(letters.r_repr()) sage: print R(rcode) [1] "a-b-c-d-e-f-g-h-i-j-k-l-m-n-o-p-q-r-s-t-u-v-w-x-y-z" sage: timeit('robjects.IntVector(range(10))') 625 loops, best of 3: 9.65 µs per loop sage: time w = robjects.IntVector(range(10^6)) Time: CPU 0.74 s, Wall: 0.74 s sage: time print R['mean'](w) [1] 499999.5 Time: CPU 0.16 s, Wall: 0.17 s sage: time print R['sd'](w) [1] 288675.3 Time: CPU 0.18 s, Wall: 0.18 s sage: time w = r(range(10^3)) Time: CPU 1.12 s, Wall: 2.56 s sage: time z = pari(range(10^6)) Traceback (most recent call last): ... KeyboardInterrupt: evaluating PARI string __SAGE__Vectors are an important basic data structure in R:

sage: print robjects.StrVector(['abc', 'def']) [1] "abc" "def" sage: print robjects.IntVector([1, 2, 3]) [1] 1 2 3 sage: print robjects.FloatVector([1.1, 2.2, 3.3]) [1] 1.1 2.2 3.3You can also create R matrices, which are R vectors with a dim attribute:

sage: v = robjects.FloatVector([1.1, 2.2, 3.3, 4.4, 5.5, 6.6]) sage: m = R['matrix'](v, nrow = int(2)) sage: print m [,1] [,2] [,3] [1,] 1.1 3.3 5.5 [2,] 2.2 4.4 6.6The above illustrates how to call an R function. You get it from the R namespace, then call it in the standard way. Here's another example:

sage: v = robjects.IntVector([1..10]) sage: print R['sum'] function (..., na.rm = FALSE) .Primitive("sum") sage: ans = R['sum'](v) sage: ansYou can also pass in keywords:

sage: rsort = R['sort'] sage: print rsort(v, decreasing=True) [1] 10 9 8 7 6 5 4 3 2 1GOTCHA: In R variable names with dots in them are allowed, but in Python they are not. The example below illustrates how to deal with this (use **kwds). In this example, we make an R vector with a "NA" in it, which means we don't know that entry; the na.rm option to R's sum command controls how it behaves on lists with NA's in them.

sage: v = R('c(1,NA,2,3)') sage: print v [1] 1 NA 2 3 sage: print R['sum'] function (..., na.rm = FALSE) .Primitive("sum") sage: rsum = R['sum'] sage: print rsum(v) [1] NADirectly in R, we would just type na.rm=TRUE. In Python this does not make sense.

sage: print R('sum( c(1,NA,2,3), na.rm=TRUE )') [1] 6 sage: print rsum(v, na.rm=True) # boom! Traceback (most recent call last): ... SyntaxError: keyword can't be an expression sage: f(*[5,,7]) 33So we use **kwds, which works fine:

sage: a = {'na.rm':True} sage: print R['sum'](v, **a) [1] 6 sage: def f(a, b, c): ... return a + 2*b + 3*c ... sage: args = (5,) sage: kwds = {'b':7, 'c':13} sage: f(*args, **kwds) 58 sage: def g(*scott, **alex): ... print scott, alex ... return f(*scott, **alex) sage: g(1,2,c=3) (1, 2) {'c': 3} 14 sage: f( *(3, 8), **{'c':2}) 25 sage: f( 2, *(5,), **{'c':1}) 15- Call the R.png function to tell R where the output image should be saved (and what size it should be).
- Draw plots on the canvas until done.
- Tell R to turn the plotting device off, which causes the output file to be written.

IMPORTANT: This must all happen in the same notebook cell. Otherwise the output file gets written in temp directory for a cell that was already evaluated, and the plot may not appear.

sage: x = robjects.IntVector(range(50)) sage: y = R.rnorm(len(x)) # normal random numbers sage: # 300r = "raw Python int" (no preparser) sage: R.png('sage.png', width=600r, height=300r) sage: R.plot(x, y, xlab="x", ylab="rnorm", col="red") sage: _ = R['dev.off']() # "_ =" to suppress printingInteract works, of course.

sage: @interact sage: def _(points=(10..1000)): ... x = robjects.IntVector(range(points)); y = R.rnorm(int(points)) ... R.png('sage.png', width=600r, height=300r) ... R.plot(x, y, xlab="x", ylab="rnorm", col="blue") ... R['dev.off']()**Warning again -- Do NOT do this:** call dev.off in a separate cell!

This is how we would do this directly in R, which we can use from Sage by using the "%r" mode in the notebook (or r.eval("""...""")):

sage: %r sage: ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) sage: trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) sage: group <- gl(2, 10, 20, labels = c("Ctl","Trt")) sage: weight <- c(ctl, trt) sage: anova(lm.D9 <- lm(weight ~ group)) sage: summary(lm.D90 <- lm(weight ~ group - 1))# omitting intercept Analysis of Variance Table Response: weight Df Sum Sq Mean Sq F value Pr(>F) group 1 0.6882 0.68820 1.4191 0.249 Residuals 18 8.7293 0.48496 Call: lm(formula = weight ~ group - 1) Residuals: Min 1Q Median 3Q Max -1.0710 -0.4938 0.0685 0.2462 1.3690 Coefficients: Estimate Std. Error t value Pr(>|t|) groupCtl 5.0320 0.2202 22.85 9.55e-15 *** groupTrt 4.6610 0.2202 21.16 3.62e-14 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6964 on 18 degrees of freedom Multiple R-squared: 0.9818, Adjusted R-squared: 0.9798 F-statistic: 485.1 on 2 and 18 DF, p-value: < 2.2e-16Next, we do the same computation, but via rpy2 (which is unfortunately more complicated):

sage: ctl = robjects.FloatVector([4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14]) sage: trt = robjects.FloatVector([4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69]) sage: group = R.gl(2r, 10r, 20r, labels = ["Ctl","Trt"]) sage: weight = ctl + trt sage: robjects.globalEnv["weight"] = weight sage: robjects.globalEnv["group"] = group sage: lm_D9 = R.lm("weight ~ group") sage: print(R.anova(lm_D9)) Analysis of Variance Table Response: weight Df Sum Sq Mean Sq F value Pr(>F) group 1 0.6882 0.68820 1.4191 0.249 Residuals 18 8.7293 0.48496 sage: lm_D90 = R.lm("weight ~ group - 1") sage: v = R.summary(lm_D90) sage: print(v) Call: function (formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset, ...) { ret.x <- x ret.y <- y cl <- match.call() mf <- match.call(expand.dots = FALSE) m <- match(c("formula", "data", "subset", "weights", "na.action", "offset"), names(mf), 0L) mf <- mf[c(1L, m)] mf$drop.unused.levels <- TRUE mf[[1L]] <- as.name("model.frame") mf <- eval(mf, parent.frame()) if (method == "model.frame") return(mf) else if (method != "qr") warning(gettextf("method = '%s' is not supported. Using 'qr'", method), domain = NA) mt <- attr(mf, "terms") y <- model.response(mf, "numeric") w <- as.vector(model.weights(mf)) if (!is.null(w) && !is.numeric(w)) stop("'weights' must be a numeric vector") offset <- as.vector(model.offset(mf)) if (!is.null(offset)) { if (length(offset) != NROW(y)) stop(gettextf("number of offsets is %d, should equal %d (number of observations)", length(offset), NROW(y)), domain = NA) } if (is.empty.model(mt)) { x <- NULL z <- list(coefficients = if (is.matrix(y)) matrix(, 0, 3) else numeric(0L), residuals = y, fitted.values = 0 * y, weights = w, rank = 0L, df.residual = if (is.matrix(y)) nrow(y) else length(y)) if (!is.null(offset)) { z$fitted.values <- offset z$residuals <- y - offset } } else { x <- model.matrix(mt, mf, contrasts) z <- if (is.null(w)) lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) else lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, ...) } class(z) <- c(if (is.matrix(y)) "mlm", "lm") z$na.action <- attr(mf, "na.action") z$offset <- offset z$contrasts <- attr(x, "contrasts") z$xlevels <- .getXlevels(mt, mf) z$call <- cl z$terms <- mt if (model) z$model <- mf if (ret.x) z$x <- x if (ret.y) z$y <- y if (!qr) z$qr <- NULL z }(formula = "weight ~ group - 1") Residuals: Min 1Q Median 3Q Max -1.0710 -0.4938 0.0685 0.2462 1.3690 Coefficients: Estimate Std. Error t value Pr(>|t|) groupCtl 5.0320 0.2202 22.85 9.55e-15 *** groupTrt 4.6610 0.2202 21.16 3.62e-14 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6964 on 18 degrees of freedom Multiple R-squared: 0.9818, Adjusted R-squared: 0.9798 F-statistic: 485.1 on 2 and 18 DF, p-value: < 2.2e-16 sage: print(lm_D9.names) [1] "coefficients" "residuals" "effects" "rank" [5] "fitted.values" "assign" "qr" "df.residual" [9] "contrasts" "xlevels" "call" "terms" [13] "model" sage: print(lm_D9.r['coefficients']) $coefficients (Intercept) groupTrt 5.032 -0.371You could also use rpy2 as follows to do this computation:

sage: R(""" sage: ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) sage: trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) sage: group <- gl(2, 10, 20, labels = c("Ctl","Trt")) sage: weight <- c(ctl, trt) sage: print(anova(lm.D9 <- lm(weight ~ group))) sage: print(summary(lm.D90 <- lm(weight ~ group - 1))) sage: """) Analysis of Variance Table Response: weight Df Sum Sq Mean Sq F value Pr(>F) group 1 0.6882 0.68820 1.4191 0.249 Residuals 18 8.7293 0.48496 Call: lm(formula = weight ~ group - 1) Residuals: Min 1Q Median 3Q Max -1.0710 -0.4938 0.0685 0.2462 1.3690 Coefficients: Estimate Std. Error t value Pr(>|t|) groupCtl 5.0320 0.2202 22.85 9.55e-15 *** groupTrt 4.6610 0.2202 21.16 3.62e-14 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 0.6964 on 18 degrees of freedom Multiple R-squared: 0.9818, Adjusted R-squared: 0.9798 F-statistic: 485.1 on 2 and 18 DF, p-value: < 2.2e-16In R a "data frame" is an array of values with labeled rows and columns (like part of a spreadsheet). Typically one thinks of a data frame as a table where the rows are observations and the columns are variables.

You can create a data frame using the data.frame R function:

sage: d = {'value': robjects.IntVector((24,25,26)), ... 'letter': robjects.StrVector(('x', 'y', 'z'))} ... sage: dataf = R['data.frame'](**d) sage: print(dataf) letter value 1 x 24 2 y 25 3 z 26 sage: type(dataf)Get each column:

sage: print dataf.r['letter'] letter 1 x 2 y 3 z sage: print dataf.r['value'] value 1 24 2 25 3 26Labels for the rows:

sage: print dataf.rownames() [1] "1" "2" "3"Labels for the columns:

sage: print dataf.colnames() [1] "letter" "value"If you are using rpy2 and Sage together to deal with large real-world data sets, then it is critical that you can quickly move data back and forth. If you're working with big data in Sage, you're probably using numpy arrays. Fortunately, there is a way to very quickly convert a big numpy array to an R vector and conversely, as we illustrate below.

NOTE: The rpy2 documentation suggests doing "import rpy2.robjects.numpy2ri" but this is broken (at least with the versions of R, rpy2, and numpy in Sage), and gives totally wrong results. So just explicitly use FloatVector, etc., as illustrated below.

sage: import numpy sage: a = numpy.array([[1,2],[3,4]], dtype=float) sage: v = numpy.arange(5) sage: print R(v) Traceback (most recent call last): ... ValueError: Nothing can be done for the type... CRAP, this seems to be just totally broken in rpy2. Maybe it is fixed in a newer version. Sorry folks.

\end{verbatim} \chapter{Abstract Algebra} \section{Groups, Rings and Fields} \begin{verbatim}The first page of "abstract mathematics" that I ever saw, accidentally misfiled in a the computer book section of Bookman's in Flagstaff. (Burton W. Jones's "An Introduction to Modern Algebra", 1975.)

A group is a set $G$ equipped with a binary operation $G \times G \to G$ that we write as a dot below that has three properties:

**Associativity**: $(a\cdot b)\cdot c = a\cdot(b\cdot c)$**Existence of identity**: There is $1\in G$ such that $1\cdot a = a\cdot 1 = a$ for all $a \in G$.**Existence of inverse**: For each $a\in G$ there is $a^{-1} \in G$ such that $a^{-1} \cdot a = a\cdot a^{-1} = 1$.

We construct objects in Sage that have a binary operation satisfying the above properties.

If it is 7am, what time will it be 10 hours from now? Answer: 5pm.

sage: G(3) + G(10) 1 sage: G.addition_table() + a b c d e f g h i j k l +------------------------ a| a b c d e f g h i j k l b| b c d e f g h i j k l a c| c d e f g h i j k l a b d| d e f g h i j k l a b c e| e f g h i j k l a b c d f| f g h i j k l a b c d e g| g h i j k l a b c d e f h| h i j k l a b c d e f g i| i j k l a b c d e f g h j| j k l a b c d e f g h i k| k l a b c d e f g h i j l| l a b c d e f g h i j kAdvanced Functionality...

sage: show(G.character_table())\newcommand{\Bold}[1]{\mathbf{#1}}\left(\begin{array}{rrrrr}
1 & 1 & 1 & 1 & 1 \\
3 & -1 & 0 & \zeta_{5}^{3} + \zeta_{5}^{2} + 1 & -\zeta_{5}^{3} - \zeta_{5}^{2} \\
3 & -1 & 0 & -\zeta_{5}^{3} - \zeta_{5}^{2} & \zeta_{5}^{3} + \zeta_{5}^{2} + 1 \\
4 & 0 & 1 & -1 & -1 \\
5 & 1 & -1 & 0 & 0
\end{array}\right)

sage: G.derived_series()
[Permutation Group with generators [(3,4,5), (1,2,3,4,5)]]
sage: G.is_solvable()
False
sage: G.upper_central_series()
[Permutation Group with generators [()]]
sage: var('x,a,b')
sage: show(solve(x^3+a*x+b==0,x)[0])
\newcommand{\Bold}[1]{\mathbf{#1}}x = \frac{{\left(-i \, \sqrt{3} + 1\right)} a}{6 \, {\left(\frac{1}{18} \, \sqrt{4 \, a^{3} + 27 \, b^{2}} \sqrt{3} - \frac{1}{2} \, b\right)}^{\left(\frac{1}{3}\right)}} - \frac{1}{2} \, {\left(i \, \sqrt{3} + 1\right)} {\left(\frac{1}{18} \, \sqrt{4 \, a^{3} + 27 \, b^{2}} \sqrt{3} - \frac{1}{2} \, b\right)}^{\left(\frac{1}{3}\right)}

sage: C = G.cayley_graph()
sage: G.cayley_graph().plot3d(engine='tachyon').show()
See the Sage docs and Wikipedia. See also my complaint.

sage: RubiksCube().plot3d().show(viewer='tachyon', figsize=2, zoom=.9) sage: G = CubeGroup(); G The PermutationGroup of all legal moves of the Rubik's cube. sage: G.gens() ['(33,35,40,38)(34,37,39,36)( 3, 9,46,32)( 2,12,47,29)( 1,14,48,27)', '(41,43,48,46)(42,45,47,44)(14,22,30,38)(15,23,31,39)(16,24,32,40)', '(17,19,24,22)(18,21,23,20)( 6,25,43,16)( 7,28,42,13)( 8,30,41,11)', '( 9,11,16,14)(10,13,15,12)( 1,17,41,40)( 4,20,44,37)( 6,22,46,35)', '(25,27,32,30)(26,29,31,28)( 3,38,43,19)( 5,36,45,21)( 8,33,48,24)', '( 1, 3, 8, 6)( 2, 5, 7, 4)( 9,33,25,17)(10,34,26,18)(11,35,27,19)'] sage: GG = PermutationGroup(G.gens()) sage: c = GG.cardinality(); c 43252003274489856000 sage: factor(c) 2^27 * 3^14 * 5^3 * 7^2 * 11An **abelian group** is a group $G$ where for every $a,b \in G$ we have $a\cdot b = b\cdot a$.

An** monoid** is the same as a group, except we do not require the existence of inverses.

A **ring** $R$ is a set with two binary operations, $+$ and $\cdot$ such that:

- $(R,+)$ is an abelian group,
- $(R^*,\cdot)$ is an abelian monoid, where $R^*$ is the set of nonzero elements of $R$,
- For all $a,b,c \in R$ we have $a\cdot (b+c) = a\cdot b + a\cdot c$.

A **field** $K$ is a ring such that $(R^*, \cdot)$ is a group.

Like with groups, Sage (and mathematics!) comes loaded with numerous rings and fields.

sage: ZZ Integer Ring sage: RR Real Field with 53 bits of precision sage: CC Complex Field with 53 bits of precision sage: RealField(200) Real Field with 200 bits of precision sage: AA Algebraic Real Field sage: Integers(12) Ring of integers modulo 12 sage: GF(17) Finite Field of size 17 sage: GF(9,'a') Finite Field in a of size 3^2 sage: ZZ['x'] Univariate Polynomial Ring in x over Integer Ring sage: QQ['x,y,z'] Multivariate Polynomial Ring in x, y, z over Rational Field sage: ZZ[sqrt(-5)] Order in Number Field in a with defining polynomial x^2 + 5 sage: QQ[['q']] Power Series Ring in q over Rational FieldJust as for groups, there is much advanced functionality available for rings (e.g., Groebner basis), but this is another story...

\end{verbatim} \section{Exact Linear Algebra} Linear algebra is the study of matrices, vectors, solving linear systems of equations, vector spaces, and linear transformation. It is a topic that is loaded with interesting algorithms, and Sage is good at it. In this section, we will focus on {\em exact linear algebra}, in which all matrices and vectors that we consider have exact entries (e.g., rational numbers, numbers modulo $p$, polynomials over the rationals, etc.), as opposed to numerical linear algebra with floating point entries; thus, for this section, roundoff error and general numerical analysis are not directly relevant. \subsection{Documentation for Linear Algebra in Sage} \begin{itemize} \item {\bf Quick Reference Card:} There is a linear algebra quick reference card available at \url{http://wiki.sagemath.org/quickref}. \item {\bf Sage reference manual:} The following chapters are particularly relevant: \begin{itemize} \item Matrices: \url{http://sagemath.org/doc/reference/matrices.html} \item Modules: \url{http://sagemath.org/doc/reference/modules.html} \end{itemize} \item {\bf Robert Beezer's book:} This is a free open source Undergraduate Linear Algebra Book, which is available here: \url{http://linear.ups.edu/} \end{itemize} \subsection{Underlying Technology} The implementation of exact linear algebra in Sage is a combination of a large amount of code written in Cython from scratch with some C/C++ libraries. The Linbox C++ library \url{http://www.linalg.org/} is used for some matrix multiplication and characteristic and minimal polynomial computations, especially for very big matrices with entries in the rational numbers or a finite field. The IML library (see \url{http://www.cs.uwaterloo.ca/~astorjoh/iml.html}) is used behind the scenes for solving systems of linear equations over the rational numbers. The M4RI library \url{} is used for linear algebra over the field with two elements. Numpy is used in a few places, but only for numerical linear algebra. Most everything relies at some on the ATLAS basic linear algebra system (BLAS) at some level (see \url{http://math-atlas.sourceforge.net/}). Yes, even multiplying two matrices over the rational numbers is eventually done by multiplying matrices with floating point entries (via a block decomposition and reduction modulo primes)! \subsection{Matrices and Vectors} First we illustrate arithmetic with matrices \begin{lstlisting} sage: A = matrix(QQ, 3, 4, [1..12]); B = matrix(QQ, 4,2, [1..8]) sage: A * B [ 50 60] [114 140] [178 220] \end{lstlisting} The following arithmetic produces errors, as it should, since mathematically it makes no sense: \begin{lstlisting} sage: A + B Traceback (most recent call last): ... TypeError: unsupported operand parent(s) for '+': 'Full MatrixSpace of 3 by 4 dense matrices over Rational Field' and 'Full MatrixSpace of 4 by 2 dense matrices over Rational Field' sage: B * A Traceback (most recent call last): ... TypeError: unsupported operand parent(s) for '*': 'Full MatrixSpace of 4 by 2 dense matrices over Rational Field' and 'Full MatrixSpace of 3 by 4 dense matrices over Rational Field' \end{lstlisting} Sage does let you add a scalar to a square matrix, which adds that scalar to each entry along the diagonal: \begin{lstlisting} sage: A = matrix(QQ, 3, [1..9]) sage: A + 2/3 [ 5/3 2 3] [ 4 17/3 6] [ 7 8 29/3] \end{lstlisting} Next we consider the problem of solving linear systems. We can encode a linear system of equations as a matrix equation $Ax = v$, where the problem is to solve for the unknown $x$ given $A$ and $v$. In Sage, $v$ can be either a vector or a matrix (and $x$ will correspondingly be a vector or matrix). If there are infinitely many solutions for $x$, Sage returns exactly one. \begin{lstlisting} sage: set_random_seed(1) sage: A = random_matrix(QQ, 5, num_bound=100, den_bound=100); A [ 59/78 13/14 -11/49 -47/75 -52/15] [ 27/56 -40/51 10/53 -89/12 -3/16] [ 82/61 -55/7 -74/45 -11/46 5/52] [-43/32 79/37 -57/29 -48/29 43/15] [ 67/47 12/23 -25/24 13/16 46/63] sage: A.det() -33309120911318572378640943486889/31089394772345027072747520000 sage: v = random_matrix(QQ, 5, 1, num_bound=100); v [-76] [ 98] [-82] [ 27] [ 51] sage: x = A.solve_right(v); x [1423743250326764132356431158406816/33309120911318572378640943486889] [ 403480176009266931788705978326932/33309120911318572378640943486889] [1021661231928866958567656117461050/33309120911318572378640943486889] [-393424222265393565078003995300100/33309120911318572378640943486889] [1153927117568938940697661220942640/33309120911318572378640943486889] sage: A*x == v True \end{lstlisting}%link You can also use the Matlab-style backslash notation for ``solve right'': %link \begin{lstlisting} sage: A \ v [1423743250326764132356431158406816/33309120911318572378640943486889] [ 403480176009266931788705978326932/33309120911318572378640943486889] [1021661231928866958567656117461050/33309120911318572378640943486889] [-393424222265393565078003995300100/33309120911318572378640943486889] [1153927117568938940697661220942640/33309120911318572378640943486889] \end{lstlisting}%link We can also use the \verb|solve_left| method to solve $xA = v$: %link \begin{lstlisting} sage: v = random_matrix(QQ, 1, 5, num_bound=10^10); v sage: x = A.solve_left(v) sage: x*A == v True \end{lstlisting} You can also solve linear sytems symbolically by using the {\tt solve} command, as illustrated below. This is fine for relatively small systems (especially when you do not want to have to think about which field the coefficients lie in), but is dramatically less powerful for large systems. \begin{lstlisting} sage: var('x1, x2, x3') sage: e = [2*x1 + 3*x2 + 5*x3 == 1, -x1 + x2 + 15*x3 == 5, x1 + x2 + x3 == 1] sage: S = solve(e, [x1, x2, x3]); S [[x1 == (18/5), x2 == (-17/5), x3 == (4/5)]] \end{lstlisting}%link Here is how to ``get at'' the solution: %link \begin{lstlisting} sage: S[0][0] x1 == (18/5) sage: S[0][0].lhs(), S[0][0].rhs() (x1, 18/5) \end{lstlisting} Using matrices and exact linear algebra in Sage, we can solve the same system as follows: \begin{lstlisting} sage: A = matrix(QQ, 3, [2,3,5, -1,1,15, 1,1,1]) sage: v = matrix(QQ, 3, 1, [1, 5, 1]) sage: x = A \ v; x [ 18/5] [-17/5] [ 4/5] sage: A*x == v True \end{lstlisting} Solving over the rational numbers using Sage matrices is quite powerful. For example: \begin{lstlisting} sage: set_random_seed(1) sage: A = random_matrix(QQ, 100, num_bound=10^10, den_bound=100) sage: v = random_matrix(QQ, 100, 1, num_bound=10^10, den_bound=100) sage: A[0] # just the first row (-9594630370/11, -2724596772/25, 1863701863/28, ... 164457253/5) sage: x = A.solve_right(v) sage: A*x == v True sage: len(x.str()) 789999 \end{lstlisting} On my 64-bit OS X dual core i7 2.7GHZ laptop, the timing to solve $Ax=v$ for exactly the above matrix in various software is as follows: \begin{itemize} \item Sage-4.6.2 (which uses the IML library): 0.45 seconds \item Magma 2.17-4: 1.39 seconds \item Mathematica 7.0: 10.5 seconds \item Maple 14: 18.2 seconds \end{itemize} The characteristic polynomial of a square matrix $A$ is $f(x) = \det(A-x)$; it has the property that $f(A) = 0$. \begin{lstlisting} sage: A = matrix(QQ, 5, [1..25]); A [ 1 2 3 4 5] [ 6 7 8 9 10] [11 12 13 14 15] [16 17 18 19 20] [21 22 23 24 25] sage: f = A.characteristic_polynomial(); f x^5 - 65*x^4 - 250*x^3 sage: f.factor() x^3 * (x^2 - 65*x - 250) sage: f(A) [0 0 0 0 0] [0 0 0 0 0] [0 0 0 0 0] [0 0 0 0 0] [0 0 0 0 0] sage: R.Compared to:

sage: print pickle.dumps(A) csage.matrix.matrix0 unpickle p0 (csage.matrix.matrix_integer_dense Matrix_integer_dense p1 csage.matrix.matrix_space MatrixSpace p2 (csage.rings.integer_ring IntegerRing p3 (tRp4 I4 I20 I00 tp5 Rp6 csage.structure.mutability Mutability p7 (I00 tp8 Rp9 (dp10 S'1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t u v 10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 1g 1h 1i 1j 1k 1l 1m 1n 1o 1p 1q 1r 1s 1t 1u 1v 20 21 22 23 24 25 26 27 28 29 2a 2b 2c 2d 2e 2f 2g' p11 I0 tp12 Rp13 . \end{lstlisting} \begin{verbatim}loads can parse both the compressed and uncompressed pickles (it figures out which is right by assuming compressed, getting an error, then trying uncompressed).

sage: loads(dumps(A)) [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20] [21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40] [41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60] [61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80] sage: loads(pickle.dumps(A)) [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20] [21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40] [41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60] [61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80]Compression has a performance penalty:

sage: timeit('loads(dumps(A))') 625 loops, best of 3: 192 µs per loop sage: timeit('loads(dumps(A,compress=False), compress=False)') 625 loops, best of 3: 130 µs per loopWe can save a pickle to a file and load it from a file:

sage: save(A, 'A.sobj') sage: save(A, '/tmp/A.sobj') 34 sage: load('/tmp/A.sobj') [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20] [21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40] [41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60] [61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80] sage: os.unlink('/tmp/A.sobj') # clean upWe can load a pickle from a webpage too, which is pretty cool:

sage: X = load('http://wiki.wstein.org/11/480a/5-25?action=AttachFile&do=get&target=A.sobj') sage: X Attempting to load remote file: http://wiki.wstein.org/11/480a/5-25?action=AttachFile&do=get&target=A.sobj Loading: [.] [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20] [21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40] [41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60] [61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80] sage: X = load('http://wiki.wstein.org/11/480a/5-25?action=AttachFile&do=get&target=A.sobj', verbose=False); X [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20] [21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40] [41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60] [61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80]**Conclusion: **

- Understanding object serialization is useful if you do some research computations, and want to record the results in a way that you can later easily recover. As long as later isn't "too late".
- It requires very little thought to use.
**save(obj, 'filename.sobj')**and**load('filename.sobj')** - You could make a simple "database" that anybody can easily use over the web by: (1) putting a bunch of sobj's on a webpage, and (2) writing a Python function that uses Sage's load command to remotely grab them off that webpage when requested. Very simple.

If you want to store a plain string to disk, and load it later, it is critical to master the Python **open** command. This is very similar to the C library open command, hence to the open command in most programming languages. You can use this one builtin Python command to both read and write files, and also to iterate through the lines of a file, seek to given positions, etc.

One can do a lot with a file, or a bunch of files in a directory. Don't use a sophisticated database just because you don't understand or know how to use files. Now you do. In some cases, they are a great solution.

Here's a nice decorator (written by Tom Boothby) that combines files with pickling.

sage: disk_cached_function?**File:** /sagenb/flask/sage-4.6.2/local/lib/python2.6/site-packages/sage/misc/cachefunc.py

**Type:** <type ‘classobj’>

**Definition:** disk_cached_function(f)

**Docstring:**

Decorator for

DiskCachedFunction.EXAMPLES:

sage: dir = tmp_dir() sage: @disk_cached_function(dir) ... def foo(x): return next_prime(2^x)%x sage: x = foo(200);x 11 sage: @disk_cached_function(dir) ... def foo(x): return 1/x sage: foo(200) 11 sage: foo.clear_cache() sage: foo(200) 1/200

Clean our mess:

sage: import shutil sage: shutil.rmtree('/tmp/factor_cache')**Summary**:

**save/load:**If you remember nothing else from today's lecture, remember these two commands, which allow you to very easily store and load most any object.**open**: It is easy to open and write to and read from files in Python.**disk_cached_function:**provides a function decorator that makes a function only ever have to evaluate a given input once, and in the future it just remembers the inputs automatically. It combines pickling with using open to read and write data to the filesystem. You can manage the cached inputs to the function outside of Sage, just by adding files to the cache directory (e.g., if you computed values of a disk_cached_function on different computers, you could just dump all the directories of files that result into a single big directory and have the combined cache).

- SQLite: a
*relational database*that is included in Sage. This is not at all Python specific, but it has excellent support for using it from Python. It's an extremely popular database -- according to their website it is*the most widely deployed database*there is. For example, iPhone apps all use it track their data... - (Maybe) SQLalchemy: an
*object relational mapper*that is included in Sage. SQLalchemy provides much more Python-friendly support on top of some relational database, but built on SQLite or MySQL or PostgreSQL.

**Using SQLite in Sage**

Check out the SQLite website. Some key points:

- SQLite is surely the most widely deployed database in the world, in some sense.
- SQLite is vastly simpler to use and administer than pretty much all other databases.
- SQLite is extremely fast (if used correctly).
- SQLite is
**public domain.**You can do absolutely anything you want with the source code. - Every copy of Sage comes with SQLite.
- Learning about SQLite may server you well in non-Sage related projects, since it can be used on its own.

Here's a complete example of using SQLite to make a database of integer factorizations.

sage: # sqlite3 is a standard Python module sage: import sqlite3 sage: # Make sure the database file isn't left over from a previous demo... sage: file = '/tmp/sqlite0' sage: if os.path.exists(file): ... os.unlink(file) sage: # open the database file -- zero configuration! sage: db = sqlite3.connect(file) sage: # get a "cursor" sage: cursor = db.cursor() sage: # start executing SQL commands sage: cursor.execute("""CREATE TABLE factorizations ... (number INTEGER, factorization TEXT, UNIQUE(number))""") ... sage: cursor.execute("CREATE INDEX factorizations_idx ON factorizations(number)") sage: # commit our changes -- SQL uses transactions sage: db.commit() sage: t = ('6', '[(2,1),(3,1)]') sage: cursor.execute('INSERT INTO factorizations VALUES(?,?)', t)We can look at our new database on the command line, completely independently of Sage/Python:

```
boxen:~ wstein\$ sage -sh
(sage subshell)\$ sqlite3 /tmp/sqlite1
SQLite version 3.4.2
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .schema
CREATE TABLE factorizations
(number TEXT, factorization TEXT, UNIQUE(number));
CREATE INDEX factorizations_idx ON factorizations(number);
sqlite> select * from factorizations;
6|[(2,1),(3,1)]
```

By the way, the UNIQUE above makes it so you can't enter another factorization of the same number.

sage: t = ('6', '[(2,1),(3,1)]') sage: cursor.execute('INSERT INTO factorizations VALUES(?,?)', t) Traceback (most recent call last): ... sqlite3.IntegrityError: column number is not unique sage: %time sage: for n in range(1,10000): ... f = str(list(factor(n))).replace(' ','') ... try: ... t = (str(n), f) ... z = cursor.execute('INSERT INTO factorizations VALUES(?,?)', t) ... except: ... print "Unable to insert factorization of %s"%n Unable to insert factorization of 6 CPU time: 0.63 s, Wall time: 0.63 s sage: time db.commit() Time: CPU 0.01 s, Wall: 0.00 s sage: a = cursor.execute('SELECT * FROM factorizations ORDER BY number;') sage: i = 0 sage: for x in a: ... print x ... i += 1 ... if i>10: break (1, u'[]') (2, u'[(2,1)]') (3, u'[(3,1)]') (4, u'[(2,2)]') (5, u'[(5,1)]') (6, u'[(2,1),(3,1)]') (7, u'[(7,1)]') (8, u'[(2,3)]') (9, u'[(3,2)]') (10, u'[(2,1),(5,1)]') (11, u'[(11,1)]')We use the command line again (we ** do not** have to exit or reload!) and find:

sqlite> SELECT * FROM factorizations where number<10; 1|[] 2|[(2,1)] 3|[(3,1)] 4|[(2,2)] 5|[(5,1)] 6|[(2,1),(3,1)] 7|[(7,1)] 8|[(2,3)] 9|[(3,2)]

Obviously, to use SQLite effectively, it helps enormously to know the SQL language. Fortunately, you don't need to know very much, there are tons of examples on the web, many tutorials, books, and SQL isn't hard to learn.

Python documentation for the sqlite3 module: http://docs.python.org/library/sqlite3.html

Next we'll spend a few moments on SQLAlchemy, which is a Python package included standard with Sage, which can also be installed easily into any Python install.

- SQLAlchemy is the
"object relational database mapper" for Python.*canonical* - SQLAlchemy abstracts away the database backend, so the same code/application can work with SQLite, PostgreSQL, Oracle, MySQL, etc.
- SQLAlchemy has a large test suite, good documentation, and is a high quality polished product.
- SQLAlchemy is MIT licensed (so very open source)

**WARNING:** As of this writing (May 27, 2011) the version of SQLAlchemy in the newest Sage (which is Sage-4.7) is the "ancient" 0.5.8 version. So make sure to look at the right version of the SQLAlchemy docs here: http://www.sqlalchemy.org/docs/05/

We will use the file /tmp/sqlite1 for our demo. Make sure it is deleted.

sage: file = '/tmp/sqlite1' sage: if os.path.exists(file): ... os.unlink(file)Create a SQLite engine, which SQLalchemy will use. This is the only place below that SQLite is explicitly mentioned.

sage: from sqlalchemy import create_engine sage: engine = create_engine('sqlite:///%s'%file) #, echo=True)Use SQLalchemy to declare a new Python class, which will get mapped to a table in the above SQLlite database.

sage: from sqlalchemy.ext.declarative import declarative_base sage: from sqlalchemy import Column sage: Base = declarative_base() sage: class IntFac(Base): ... ... __tablename__ = 'factorizations' ... number = Column(sqlalchemy.Integer, primary_key=True) ... factorization = Column(sqlalchemy.String) ... ... def __init__(self, number): ... self.number = int(number) ... self.factorization = str(list(factor(number))).replace(' ','') ... ... def __repr__(self): ... return '%s: %s'%(self.number, self.factorization)Make a particular session that connects to the database.

sage: from sqlalchemy.orm import sessionmaker sage: session = sessionmaker(bind=engine)()Create the tables. In this case, there is exactly one, which corresponds to the IntFac class above.

sage: Base.metadata.create_all(engine)Now create an integer factorization object.

sage: f = IntFac(6); f 6: [(2,1),(3,1)]And add it to our session, so it will get tracked by the database.

sage: session.add(f)Commit everything we have done so far. After this commit, the database exists separately on a disk on file, and we can inspect it using the sqlite3 command line program.

sage: session.commit()wstein@boxen:/tmp\$ ls -lh /tmp/sqlite1 -rw-r--r-- 1 sagenbflask sagenbflask 2.0K 2011-05-27 13:46 /tmp/sqlite1 wstein@boxen:/tmp\$ sqlite3 /tmp/sqlite1 SQLite version 3.4.2 Enter ".help" for instructions sqlite> .schema CREATE TABLE factorizations ( number INTEGER NOT NULL, factorization VARCHAR, PRIMARY KEY (number) ); sqlite> select * from factorizations; 6|[(2,1),(3,1)]

We try a query on the session:

sage: session.query(IntFac).first() 6: [(2,1),(3,1)]We try adding the factorization of 6 again. This should give an error because number is the primary key, hence must be unique.

sage: session.add(IntFac(6)) sage: session.commit() Traceback (most recent call last): ... sqlalchemy.orm.exc.FlushError: New instanceOnce an error occurs the only option is to rollback the whole transaction.

sage: session.rollback()Let's make a few thousand factorization (like we did above) and include them all in one transaction in the database.

sage: time v = [IntFac(n) for n in [1..5] + [7..10000]] Time: CPU 1.98 s, Wall: 1.98 sUsing add_all should be more efficient than calling add many times.

sage: time session.add_all(v) Time: CPU 0.35 s, Wall: 0.36 s sage: time session.commit() Time: CPU 6.59 s, Wall: 6.59 sNow we have factorizations of all integers up to 10000. We can do a query like above.

sage: for X in session.query(IntFac).filter('number<10'): ... print X 1: [] 2: [(2,1)] 3: [(3,1)] 4: [(2,2)] 5: [(5,1)] 6: [(2,1),(3,1)] 7: [(7,1)] 8: [(2,3)] 9: [(3,2)]And, we can do the same on the command line:

sqlite> select * from factorizations where number<10; 1|[] 2|[(2,1)] 3|[(3,1)] 4|[(2,2)] 5|[(5,1)] 6|[(2,1),(3,1)] 7|[(7,1)] 8|[(2,3)] 9|[(3,2)]\end{verbatim} [[TODO: Add something about storing BLOBS = pickled objects in a database, e.g., my key:value store demo from 580d.]] \bibliography{biblio} \end{document}