\documentclass[11pt]{article}
\newcommand{\thisdocument}{Project Description}
\include{macros}
\usepackage{pdfsync}
\usepackage{graphicx}
\usepackage{fancybox}
\usepackage{wrapfig}
\usepackage{pdfsync}
\usepackage{amsmath}
\usepackage{url}
\usepackage{hyperref}
\usepackage{amssymb}
\usepackage{graphicx}
\newenvironment{bulletlist}
{
\begin{list}
{$\bullet$}
% {$\cdot$}
{
\setlength{\itemsep}{.5ex}
\setlength{\parsep}{0ex}
\setlength{\leftmargin}{2.2em}
\setlength{\parskip}{0ex}
\setlength{\topsep}{0.5ex}
}
}
{
\end{list}
}
\bibliographystyle{amsplain}
\begin{document}
\begin{center}
\LARGE\bf
S2I2 Exploratory Workshop:\\
\Large\bf
Open Source Software as a Foundation for Scientific Research
\end{center}
%\begin{itemize}
%\item TODO: Maybe cite ``Sustainable open source'' from \url{http://www.oss-watch.ac.uk/resources/sustainableopensource.xml}
%\item Start with an anecdote about my personal motivation for Sage.
%It could be describes as a strong concern about longterm
%``sustainability'' of math research foundations, as I describe
%here: \url{http://sagemath.blogspot.com/2009/12/mathematical-software-and-me-very.html}
%\end{itemize}
\section{Objectives}
The goal of this workshop is to discuss the viability and
potential impact of a Scientific Software Innovation
Institute (S2I2) supporting the use of open source
software for scientific research.
The PIs would use their unique experience as
leaders in the Sage and SciPy projects to
put together a workshop and corresponding report that would
address many questions relevant to such an institute.
%Our investigation will build on the extensive work that the PI's have
%coordinated during the last several years on community projects, which has
%resulted in the development of high quality free open source software:
Stein founded the NSF-funded Sage software project in 2005
(\url{http://www.sagemath.org}). The main goal of the Sage project
is to create a viable free open source alternative to Magma, Maple,
Mathematica, and MATLAB. An important inspiration for
Sage was a short workshop at NSF in 2003,
run by Brian Conrey, on the future of computers in
mathematical research.
Millman is active in the scientific Python community. He
serves on the steering committee for both NumPy (\url{http://numpy.org})
and SciPy (\url{http://scipy.org}), which are the two
fundamental libraries
for numerical and scientific computing in Python. In addition to organizing
numerous workshops and sprints, he has organized the last three SciPy
conferences in the US as well as the first two SciPy conferences in India.
He is also one of the founders and developers for the neuroimaging in Python
project (\url{http://nipy.org}).
\begin{quote}
``I think we need a symbolic standard to make computer manipulations
easier to document and verify. And with all due respect to the free
market, perhaps we should not be dependent on commercial software
here. An open source project could, perhaps, find better answers to
the obvious problems such as availability, bugs, backward
compatibility, platform independence, standard libraries, etc. One
can learn from the success of \TeX{} and more specialized software like
Macaulay2. I do hope that funding agencies are looking into this.''
-- Andrei Okounkov (see \cite{threefields}).
\end{quote}
\subsection{Intellectual Merit and Broader Impact}
The {\bf intellectual merit} of the report that comes out of this
workshop is that it promises to provide a snapshot of the core issues
of sustainability, peer review, and reproducibility, which are
becoming a vital concern in all areas of the mathematical sciences.
The report will provide a deep and unique perspective drawing on the
extensive experience of the PIs.
In the long run, this workshop could lead to the creation of an
institute, which would develop free open source
software infrastructure, peer review models, and reproducible research
methodologies. This institute has the potential to dramatically
change the tools used by all students, researchers, scientists, and
engineers. Thus this one workshop could potentially have a {\bf broad
impact} on nearly everybody involved in any way in the mathematical
sciences and engineering.
In the short term, the main {\bf broader impact} would be that
workshop participants will be more aware of sustainability and
reproducibility issues in computational mathematics. Also, several
graduate students will be involved in creation of the report and
preliminary surveys, which will expand their understanding of
computation in mathematics. Moreover, the workshop report and data
will be made widely available, which could result in raised awareness
of these issues by the mathematical sciences communities.
\subsection{The Deliverable}
We will write a report addressing the creation of an S2I2,
identifies tools and techniques that have been
successful in community software projects, and explores
challenges in using supercomputing
infrastructure to tackle research problems.
Work on this report would begin immediately with various background
data gathering activities carried out by a few small working groups,
followed by a 2-day meeting in late July. In
August, the results of the meeting and data would be refined and
organized, and a final report would be presented to the NSF in
September 2010.
\subsection{Identifying Target Communities and Foci}
\label{targets}
The workshop will include discussions about which communities and foci
provide the best opportunities for an S2I2.
As background, before the workshop we
will form working groups that will make a public wiki containing a
list of areas of mathematics (based on the AMS subject classification)
and areas of scientific computing. The working groups for each area
will summarize the status of relevant software,
algorithms, and communities.
\begin{example}
For computation with function fields, we might find that only the
proprietary computer algebra system Magma implements
many of the important algorithms, and that
there is a vibrant group of
mathematicians working on algorithms in this area, motivated
by challenge problems in arithmetic geometry and cryptography.
\end{example}
\begin{example}For machine learning, we might find that there are
several open source packages, such as Weka and Orange, that
cater to different communities and offer slightly different
features.
\end{example}
The Sage project has already created surveys for several areas
of mathematics, and this workshop would provide an opportunity
for more of these. A rough draft
will be completed before the workshop, and one of the activities
of the workshop will be to discuss the draft, and form
recommendations about the most useful mathematical and scientific
areas on which to focus our efforts.
%The members of the working groups will be recruited from graduate
%students and researchers.
\subsubsection{Survey Methodology}
We will carry out surveys during June and early July 2010, which we will
analyze at the workshop. The survey will consist of a list of key open-
ended questions.
Questions may include:
\begin{bulletlist}
\item {\em What workshop topics would compel you two
attend?} Example responses: ``A workshop on practical algorithms
and software for computing normal forms of integer matrices in
the context of algebraic topology, group theory, and number theory'' or
``A workshop on efficient algorithms and software for signal and
imaging processing in the context of the biological sciences.''
\item {\em Which algorithms are crucial to your research and why?}
Example responses: ``Hess's algorithm for computing
Riemann-Roch spaces, which is critical to understanding the
arithmetic of curves and constructing optimal error correcting
codes'' or
``the F4 algorithm for computing
Groebner basis, which plays a key role in developing
block ciphers in cryptography.''
\item {\em Who are the current and potential leaders in practical
applications and development of software in your research area
(please include graduate students)?}
\end{bulletlist}
\subsection{Identifying Tools, Techniques, and Processes}
\label{tools}
Another goal of this proposal is to identify and describe
concrete tools and techniques that can provide a foundation for
sustainable, high-quality software infrastructure. There are many
tools, such as source code management systems, code review systems,
and automated testing systems, that have been developed to support
software engineering. Given the software
we hope to build, support, and sustain, some tools are more
appropriate and effective than others.
We will look at existing projects, including Sage, NumPy,
Python, OpenOffice, Firefox, R, Macaulay2, Scilab, etc.,
and identify the role of:
\begin{bulletlist}
\item Programming languages,
\item Code peer review processes and systems,
\item Automated regression and unit testing software,
\item Specific bug trackers, and
\item Mailing lists, discussion groups, and chat rooms.
\end{bulletlist}
Background work
identifying these tools will be carried out by working groups
before the workshop, and the results will be discussed at
the workshop.
The report will contain tables examining which software
engineering techniques are
used by mathematical and scientific software communities
and which are not. This may follow the format and methodologies that
Steve Easterbrook has recently introduced in studying software
engineering in the climate change
community.
\subsection{Personnel}
An S2I2 would host a mix of short-term and long-term
visitors as well as technical and administrative personnel.
During the workshop we will explore possible organizational
structures. Finding the right personnel will be essential
to providing continuity to the software projects supported
by the institute. Among the questions we will raise are:
\begin{question}
{\em What kind of personnel should an S2I2 have?} Example responses:
``It is essential for the S2I2 to have a few full-time technical software
developers employed for many years''; or ``It is more important to
hire short-term personnel over full-time employees (e.g., graduate students
during summer session, faculty on sabbatical).''
\end{question}
\begin{question}
{\em Should the S2I2 software engineers be drawn from industry
or academia?} Example response: ``It is better to hire scientists
and mathematicians with experience in software
development over professional software engineers with some
background in science and engineering.''
\end{question}
\subsection{Sustainability}
We will also discuss the long-term sustainability of
open source software and communities supported by an S2I2.
If the institute itself is necessary for long-term success, how
will the institute sustain itself in the long run?
\begin{question}
{\em How should the institute seek long-term sustainability?}
\end{question}
\begin{question}
{\em What are the trade-offs between a not-for-profit institute and a
for-profit company?}
\end{question}
\begin{question}
{\em Should the institute sell books and training?}
\end{question}
\begin{question}
{\em Should an institute pursue fund raising (e.g., donations)?}
\end{question}
\begin{question}
{\em What are the positive and negative aspects of other revenue streams?}
For example, one source of income could be maintaining
Sage Notebook servers for hundreds of universities and schools.
\end{question}
% ,
% which could result in a funding model similar to WeBWork (see
% \url{http://webwork.maa.org/moodle/}).
% \item TODO: lawyer at our workshop?
% \item TODO: science commons (lawyer)
% \item TODO: van Linsbergh PSF legal council.
% \item TODO: sponsoring institutions. (like msri)
% \item TODO: simath example.
\subsection{Leveraging Cyberinfrastructure}\label{cyber}
In what ways could an institute produce software that best leverages
national and international cyberinfrastructure? For example, one of the PIs
(Stein) recently ran a tutorial workshop on Sage at the Scientific
Software Day at the Texas Advanced Computing Center (see
\url{http://www.tacc.utexas.edu/softwareday/}), where he learned
about some of the challenges research scientists experience when
using resources such as the NSF-funded TeraGrid. We
will invite people from this community to identify and
understand some of the problems they face:
\begin{bulletlist}
\item Running massively parallel computations on supercomputers.
\item Combining code in different languages,
including Fortran, MATLAB, C/C++, and Python.
\item Dealing with software licensing and copyright issues in the
context of supercomputing, where traditional commercial
licensing may not be affordable.
\item How to manage and document evolving software, which encodes
sophisticated models.
\item How to easily and efficiently create user interfaces for
exploring large data sets.
\item How to overcome the reluctance of some mathematicians and
scientists to utilize the TeraGrid
resources available to them to attack grand challenge problems in
their research areas? Is this more an issue of awareness, lack of
necessary training, technical skills, or appropriate software?
\end{bulletlist}
%The deliverable for this part of the workshop will be a report on the
%challenges and problems identified above.
%It will also help inform
%NSF about some of the issues confronting working computational
%scientists that the NSF's S2I2 program could then address head on.
\section{Transformative Impacts}
In this section, we discuss some of the potential transformative
impacts on both mathematical and computational research in
science and engineering, which could come out of this workshop.
\subsection{Impacts on Mathematical Research}
Questions of correctness, reproducibility, and scientific value arise when
building mathematical research on top of proprietary software. There are
published refereed papers containing results that rely on computations
performed in Magma, Maple, or Mathematica, including several by one of the PIs
(Stein). In some cases, a specific version of Magma is the only software
that can carry out the computation.
\begin{question}
What can an S2I2 do to ensure that computational mathematical
results can be shared and reproduced \cite{joyner-stein:notices}?
\end{question}
Thousands of papers rely on results computed by commercial
software using unpublished algorithms available in only that
software and nowhere else. For example,
Magma, which has a relatively small user base,
notes that ``We are currently aware of
approximately 3,000 publications about research in which Magma or
Cayley have played a role'' (see
\url{http://magma.maths.usyd.edu.au/magma/citations/}).
To maintain their competitive advantage, some algorithms
in proprietary mathematical software are not published.
Some vendors go so far as to blatantly argue against exposing
the internals of their software:
\begin{quote} ``Indeed, in almost all practical uses of Mathematica, issues about how Mathematica works inside turn out to be largely irrelevant. You might think that knowing how Mathematica works inside would be necessary [...]'' (See \cite{mathematica:internals}.) \end{quote}
At the workshop, we will explore to what extent the current dominant
use of proprietary software to support research
is viable and sustainable.
Concerns may include:
\begin{bulletlist}
\item Tens of millions of lines of source code are kept secret in
proprietary mathematical software. Should we be concerned
that mathematical research is being built on top of this foundation?
Or, are mathematical algorithms described well enough in the
literature that independent implementations are practical, and the
use of proprietary software will never lead to a crisis in the
foundations of mathematical research?
\item Older versions of software are often not available,
so it may be impossible to run computations described in a
paper. What problems does this present to readers?
% Even if a user owns an older version of the software, they
% might not be able to run it because they do not have access to the
% computer they were using when they bought the software, since all of
% the proprietary mathematical software systems use copy protection to
% lock the software to a particular computer.
\item It can be problematic for researchers at different institutions
with different software to collaborate if one institute only has
access to Maple and the other has only Mathematica. What
limitations on scientific research are imposed by the use of proprietary
software?
\end{bulletlist}
Answers to the above questions will undoubtedly vary depending on subject area
and researcher. In order to obtain a well-rounded perspective on the above
questions, we will ensure heavy users of proprietary software are invited. For
example, one of the PIs (Stein) recently had a discussion with David Farmer,
a frequent Mathematica user, about his computations with Maass forms.
Evidently, several researchers have developed completely independent closed
code bases in different languages for doing computation with Maass forms. Due
to the subtle numerical issues involved in these computations, Farmer explains
that this has been good for research on Maass forms, since if several different
codes produce the same answer, then one has greater confidence in the result.
We finish this section with two anecdotes.
\subsubsection{MuPAD-Combinat}
The MuPAD-Combinat project, which was started by Florent Hivert and
Nicolas M. Thi\'ery in 2000, built the world's preeminent system for
algebraic combinatorics on top of MuPAD. In 2008, MuPAD was purchased
by MathWorks (makers of MATLAB), so MuPAD is no longer available as a
separate product, and now costs \$3000 (commercial) or \$700
(academic).
As a result, the MuPAD-Combinat group has spent several years
reimplementing their code as part of Sage.
The MuPAD-Combinat
group was not taken by surprise by the failure of MuPAD, but instead
was concerned from the beginning by the inherent risk in building
their research program on top of a commercial plaform. In fact, they decided to
switch to Sage two months before the bad news hit, and have since made
tremendous progress porting their code.
This work has received substantial
funding from the NSF as an FRG (DMS-0652641, DMS-0652652, DMS-0652668, DMS-0652648):
\begin{quote}
``The FRG paid in particular for my 18 months of sabbatical at Davis,
which was critical in the switch from MuPAD to Sage. [...]
It has been such a relief during the last two years not to have
this Damocles sword on our head!''
\hspace{5em} -- Nicolas M. Thi\'ery, personal communication.
\end{quote}
\subsubsection{F4: Fast Computation of Groebner Basis}
Consider the F4 algorithm for computing Groebner basis. Magma and
Maple each have their own closed ``secret'' implementations of this
algorithm, and their implementations do extremely well on practical
problems.
For the last 10 years, many people have tried valiantly to
produce open source implementations that are competitive in general
and nobody has yet succeeded.
There are a number of other similar algorithms, whose implementation
takes substantial focused effort and research, since the
actual algorithms themselves are not published. Could an institute,
with several-month long programs for creating sustainable first-rate
implementations of these algorithms, lay the
foundations for future generations of researchers, and also
educate more people about how these algorithms work?
%Perhaps everybody at the program would be involved in implementing the
%algorithm, instead of just one person who works full-time at a
%proprietary software company.
%Our report will address whether this is likely to work, and why.
\subsubsection{Transformation?}
Browsing a listing of new papers at \url{http://arxiv.org}, one is struck
by how many rely on mathematical software; this reliance
is only likely to grow. Could an institute that fosters open
source mathematical software
transform how mathematical research is conducted? Could it
refine, popularize, and teach the tools, structures, and development
processes needed so that we, our graduate students, and their
students, can create sustainable software infrastructure to support
mathematical research? Or is such education and transformation better
done in other ways?
%It is difficult to imagine how this transformation will happen without
%the support of an institute. Could such an institute address the
%extreme difficulty involved in implementing certain algorithms and
%tools?
\subsection{Impacts on Science and Engineering}
We would discuss during the workshop the extent to which
an S2I2 would
impact the use of computational methods and
software in science and engineering. Over the last several years,
the PIs have witnessed an increasing trend for research scientists
and engineers to work in a high-level interpreted environment (such
as Python or MATLAB) to do exploratory interactive work. The institute
could provide a coordinating facility to ensure that the latest algorithms
and methods are available in a system that everyone can utilize and
benefit from.
Statisticians have done something similar using the open
source statistical package
R as the de facto standard for publishing statistical algorithms and
methods.
%The institute also has the potential to training scientists in
%programming best practices.
%An institute has the potential to promote reproducible
%research.
%During the workshop we will explore how an institute
%could achieve these goals.
\begin{question}
What is the best way to train scientists in software engineering practices
and methodologies? What kind of training activities should an institute
promote to ensure the long-term viability of community developed software?
\end{question}
\begin{question}
What role should an S2I2 play in promoting reproducible research? What
kind of software tools or methods have the greatest impact and promise?
\end{question}
In summary, as scientific research grows increasingly dependent on computing,
it may be critical that our computational resources are developed
with the same rigor, open review, and access as the results they
support. An S2I2 institute has the potential to help promote:
\begin{bulletlist}
\item Sharing of scientific software, data, and knowledge necessary for
reproducible research,
\item Unrestricted access to research outcomes and educational tools,
\item Open source software,
\item Academic recognition of computational work on equal footing
with the publication of results,
\item Tested, validated, and documented software as the basis for
reliable scientific outcomes, and
\item High standards of computational literacy in the education of
mathematicians, scientists, and engineers.
\end{bulletlist}
This workshop will provide feedback
on the above suggested goals, along with others suggested by
participants. For
example, which of the above goals is most
important to researchers and why?
\section{Statement of Need}
There is a strong interest in high quality, open
source, community developed software for mathematical and scientific
research. Computation is playing a significant role
in many areas of mathematical research, science, and
engineering. The long-term vitality of this research
may be enhanced by open source community developed software. An institute
would provide a home base for some of these projects, and
this workshop will provide a way to assess such needs.
\section{Related Events}
In this section, we list Sage and SciPy workshops during the last few
years. The PIs have been heavily involved in these workshops
and will draw on relevant connections in recruiting people for the
proposed S2I2 workshop.
\subsection{Sage Days Workshops}
Sage Days workshops are gatherings in which a few dozen
undergraduates, graduate students, postdocs, professors, and others
come together for about 5 days and passionately design and code
algorithms that improve Sage. These events are highly relevant to the
present proposal, because they provide a tried and tested model for
the sort of workshops that could take place as part of future
S2I2-funded activities. The workshops involve
notable research and educational activities. For example, at Sage Days 19 at
the Clay Mathematics Institute in Dec. 2009 (see \url{http://wiki.sagemath.org/dayscambridge2}), Jennifer Balakrishnan completed
the first ever verification of something called Kolyvagin's conjecture
for a rank 3 elliptic curve, whilst Barry Mazur and Kiran Kedlaya
made some initial forays into anabelian geometry using Sage, and Karl-Dieter Crisman organized
a day of talks devoted to the use of Sage in college teaching.
The Sage community has grown dramatically over the last four years as
a result of over two dozen workshops in which the development and use
of Sage has played a central role. The workshops listed below have
primarily involved topics in number theory, arithmetic geometry,
computer algebra, large-scale fixing of bugs, and algebraic
combinatorics. Stein organized or co-organized most of these
workshops. The frequency of these workshops has grown: {\em we
anticipate that there will be at least 12 such workshops during
2010.} Considering that the number of Sage Days per year is nearly
half the number of workshops of an institute such as AIM or MSRI, the Sage Days workshop series may be thought of as
a {\bf virtual institute} that has grown up around the Sage project.
{\begin{bulletlist}
\item {\em Sage Days 1:} Feb. 2006 at UC San Diego.
\item {\em Summer Graduate Workshop on Computing with Modular Forms:} July 2006 at MSRI.
\item {\em Sage Days 2:} Oct. 2006 in Seattle, WA.
\item {\em Interactive Parallel Computation in Support of Research in Algebra, Geometry and Number Theory:} Feb. 2007, MSRI.
\item {\em Sage Days 3:} Feb. 2007 at IPAM (UCLA).
\item {\em Sage Days 4:} June 2007 in Seattle, WA.
\item {\em Sage Days 5: Computational Arithmetic Geometry,} Oct. 2007, CMI,
Boston.
\item {\em Sage Days 6:} Nov. 2007, Heilbronn Institute, Bristol, UK.
\item {\em Sage Days 7:} Feb. 2008, IPAM (UCLA).
\item {\em Sage Days 8: Number Theory and High Performance Computation}, Mar. 2008,
Austin.
\item {\em Sage Days 8.5: Developer Coding Days}, June 2008 in Seattle, WA.
%\item {\em FRG on $L$-functions Summer School and Coding Sprint:} June 2008, UW.
%\item {\em FRG Workshop on $L$-functions and Modular Forms:} June 2008, UW.
\item {\em Sage Days 9: Mathematical graphics and visualization}, Aug. 2009, SFU,
Vancouver.
\item {\em Sage Days 11: Special functions and computational number theory meet scientific computing}, Nov. 2008 in Austin, TX.
\item {\em Sage Days 12: Bug Smash}, Jan. 2009 in San Diego, CA.
\item {\em Sage Days 13: Quadratic Forms and Lattices}, March 2009 in Athens, Georgia.
\item {\em Sage Days 14: Sage and Macaulay2 for Algebraic Geometry}, Mar. 2009, MSRI.
%\item {\em Arizona Winter School: Quadratic Forms}, March 2009.
\item {\em Sage Days 15: Developer Days}, May 2009 in Seattle, WA.
\item {\em Sage Days 16: Computational Number Theory}, June 2009 in Barcelona, Spain.
\item {\em Sage Days 17: Computing with Modular forms and $L$-functions}, Sep. 2009, Lopez Island.
\item {\em Sage Days 18: Computations related to the BSD Conjecture}, Dec. 2009,
CMI, Boston.
\item {\em Sage Days 19: Second Sage Bug Smash}, January 2010 in Seattle, WA.
\item {\em Sage Days 20: Combinatorics}, Feb. 2010, Marseille, France.
\item {\em Sage Days 20.5: Algebraic Combinatorics, Rep. Theory}, May 2010, Fields Institute.
\item {\em Sage Days 21: Function fields}, May 2010 in Seattle, WA.
\item {\em Sage Days 22: MSRI Graduate Student Workshop on Elliptic Curves}, June 2010, MSRI.
\item {\em Sage Days 23: Number theory and computer algebra}, July, 2010,
Leiden, Netherlands.
\item {\em Sage Days 23.5: Singular and Sage}, July, 2010, Kaiserslautern, Germany.
\item {\em Sage Days 24: Symbolic computation}, July, 2010 at RISC in Linz, Austria.
\item {\em Sage Days 25: Numerical computation}, August, 2010 in Bombay, India.
\end{bulletlist}}
\subsection{SciPy Events}
The SciPy community has had numerous conferences, coding sprints, and developer
meetings over the last several years. There are annual conferences
in the US, Europe, and India, which bring together
scientists, mathematicians, and programmers from
academia and industry. Three years ago, Millman founded and
edited the peer-reviewed conference proceedings for the US conference.
While the conferences typically have 100--200 attendees, the sprints and
developer meetings typically range from 10--20 participants. The sprints
are similar to Sage Days (described above), but typically focus on the
NumPy and SciPy libraries. The developer meetings are rarer, but are useful
when to decide on a major strategic change in code. Millman has
organized or co-organized most of the SciPy events:
\begin{bulletlist}
\item {\em 2002 SciPy Conference}, August 2002 at Caltech in Pasadena, CA
\item {\em 2003 SciPy Conference}, August 2003 at Caltech in Pasadena, CA
\item {\em 2004 SciPy Conference}, August 2004 at Caltech in Pasadena, CA
\item {\em Future directions for SciPy meeting}, March 2005 at UC Berkeley
\item {\em 2005 SciPy Conference}, August 2005 at Caltech in Pasadena, CA
\item {\em 2006 SciPy Conference}, August 2006 at Caltech in Pasadena, CA
\item {\em 2007 SciPy Conference}, August 2007 at Caltech in Pasadena, CA
\item {\em 2007 SciPy Sprint}, August 2007 at Caltech in Pasadena, CA
\item {\em 2007 European SciPy Conference},
\item {\em SciPy Sprint}, December 2007 at UC Berkeley
\item {\em SciPy Sprint}, March 2008 in Austin, TX (joint with Sage Days 8).
\item {\em SciPy Sprint}, March 2008 in Paris, France (joint with NIPY and IPython sprint).
\item {\em SciPy Sprint}, April 2008 at UC Berkeley
\item {\em SciPy Sprint}, July 2008 in Austin, TX (focus on MayaVi)
\item {\em 2008 SciPy Conference}, August 2008 at Caltech in Pasadena, CA
\item {\em 2008 European SciPy Conference},
\item {\em 2009 SciPy Conference}, August 2009 at Caltech in Pasadena, CA
\item {\em 2009 European SciPy Conference}, 2009
\item {\em 2009 SciPy India Conference}, December 2009 in Kerala, India
\item {\em 2010 SciPy Conference}, June 2010 in Austin, TX
\item {\em 2010 European SciPy Conference}, July 2010 in Paris, France
\item {\em 2010 SciPy India Conference}, December 2009 in Hyderabad, India
\end{bulletlist}
\section{Organizers}
In this section, we list the names of the chairperson and members of
the organizing committee and their organizational affiliation.
\begin{enumerate}
\item {\bf William Stein (chair)}, Department of Mathematics, University of Washington
\item {\bf Fernando Perez}, Neuroscience Institute, University of California, Berkeley
\item {\bf Jarrod Millman}, Neuroscience Institute, University of California, Berkeley
\item {\bf Victoria Stodden}, Statistics, Stanford University
\end{enumerate}
\section{Location and Announcements}
\subsection{Location}
The workshop will take place on Friday, July 30th and Saturday, July 31st.
Given the dates, it would be optimal to have the workshop in Berkeley or
Seattle. If
it has to be in the DC area, then we will have it in the DC area.
%here is the hotel we used for the previous
%NSF workshop (2007 CRCNS Data Sharing workshop):
%
%The Inn \& Conference Center, University of Maryland University
%College. Details and directions are available on their website, at
%\url{http://umucmarriott.com/}.
\subsection{Announcements}
The meeting will be announced on the main Sage mailing list, on several
SciPy-related mailing lists, the R mailing list, and any other relevant lists.
We will also announce the workshop (and get feedback on the proposed questions)
at several conferences the PIs are already organizing this summer, including:
Sage Days 22: Computing with Elliptic Curves at the MSRI in UC Berkeley; the
9th annual SciPy Conference in Austin; Sage Days 23: Number Theory in Leiden,
the Netherlands; the 3rd annual Euro SciPy Conference in Paris, France; Sage
Days 23.5: Singular and Sage in Kaiserslautern, Germany (July 14-16, 2010); and
Sage Days 24: Differential Algebra, Special Functions in Linz, Austria (July
17-22, 2010).
\section{Organizational Plan}
The meeting will be conducted over two days. The first day will be
mostly focused on brainstorming and structured around a series of
short presentations followed by discussion. The second day will be
focused directly around drafting the workshop report. We will expect
full participation by all workshop attendees on the first day. A
smaller set of attendees will be asked to participate in the morning
session of the second day and only the organizers will be expected to
stay through the that afternoon, though everyone will be
encouraged to stay for the entire workshop.
\subsection{Day 1: Morning Session}
The meeting will begin with an hour of introductory remarks from Stein
and Millman setting out the general objectives of the workshop. These
remarks will be immediately followed by an hour of discussion focused
around these objectives. After a short break, we will have four
speakers who will each deliver a brief 10 minute presentation followed
immediately by a related discussion.
These presentations will be solicited from our invited attendees prior
to the meeting and will focus on {\em Tools, Techniques, and
Processes} (see Section~\ref{tools} and \ref{cyber}).
\subsection{Day 1: Afternoon Session}
The afternoon session will start with four speakers who will each
deliver a brief 10 minute presentation followed immediately by
discussion. These
presentations will be solicited from our invited attendees prior to
the meeting and will focus on {\em Reproducible Research} and target
communities where this is most important (see Section~\ref{targets}).
After a short break, we will have a presentation and discussion on the
results of our survey. We will end with a general discussion and
critique led by Stein and Millman.
\subsection{Day 1: Evening}
We expect the discussion to continue over dinner. After dinner, the workshop
organizers will synthesize the day's discussion and
prepare a presentation for the morning session.
\subsection{Day 2: Morning Session}
The second day will begin with an hour long presentation and discussion led by
Stein and Millman that summarizes the previous day's meeting. Following this
discussion, participants will break into small writing groups to draft some
parts of the workshop report. One hour before lunch we will reconvene to
present and briefly discuss the rough drafts.
\subsection{Day 2: Afternoon Session}
After lunch, most attendees will leave. Anyone who remains can join the workshop
organizers in working on the report.
\subsection{Post Workshop}
The organizers will flesh out the draft. The draft will be stored on
a collaborative wiki that can be edited by any participant and
workshop participants will be encouraged to provide feedback. A final
typeset version of the report will be delivered to the NSF by
September 2010.
\section{Recruitment Plan}
\subsection{Academia}
In addition to soliciting applications to attend the workshop from several
mailing lists, we will contact
developers directly. We will target
the 85 components of Sage and related open mathematical software
projects, including PARI, Singular, GAP, Macaulay2, CVXOPT, and
Maxima. Millman will contact developers involved in
scientific computing in Python, including NumPy, SciPy, matplotlib,
pymvpa, mdp, FiPy, and PyDSTool. People interested in participating
will apply to the organizers, and we will select applications based on
available funding and the potential of applicants to contribute.
\subsection{Government and Industry}
The PIs have extensive contacts in industry and will recruit participants from
there. This includes contacts with Google, Microsoft, Boeing, and Enthought,
who have all funded work on mathematical research software, much in
conjunctions with the PIs. The PIs also have contacts in the financial
sector (e.g., Lisa Goldberg at MSCI Bara \url{http://www.mscibarra.com/}),
government and, and will solicit their input on the workshop and the
institute.
\subsection{Underrepresented Minorities, Women, and Persons with
Disabilities}
We will ensure that some
underrepresented minorities and persons with disability are invited
to participate in the workshop, and ask
them for ideas about other people to invite.
The PIs have long-term collaborations that they can draw upon
to ensure this.
One of the Sage developers recently compiled a list of women who
have contributed code to Sage, and we will contact some of them about applying
to participate in the workshop and their ideas for people to invite. Moreover,
the PI (Stein) obtained funding from Kristin Lauter (head of cryptography at
Microsoft Research) to run a ``Women in
Sage Days'' workshop soon, and will encourage attendees from the ''Women in
Sage Days'' to apply for the workshop.
\subsection{Students for Pre-workshop Data Gathering}
After making a list of areas to cover, we will start by emailing our
local departments, then the Sage and SciPy/NumPy lists. We will also
contact S.~Easterbrook at Toronto and see if he can recommend a
student to help us, since some of our background projects
are inspired by his research.
\section{Budget}
We are requesting \$48K. The majority of these funds (about \$42K) will
be used to cover travel (\$950), accommodation (\$350), and food (\$150)
for 30 workshop attendees. Most of the workshop attendees will
be staying two nights at the hotel (about \$175/night). We will also have a catered
working breakfast, lunch, and dinner on the first day of the workshop. For the
second day we will provide a working breakfast and lunch.
We also request 6 stipends (\$1K/each) for students in the
working groups to create the surveys described elsewhere in this
proposal.
\input{fakebib.tex}
\end{document}