Guidelines for paper preparation

Much of our job as scientists consists of writing papers. I am not saying this because a strong publication list is good for our careers (though it definitely is). I am saying this because in my experience 90% of the real work on the project happens while "writing the paper".

It is during paper preparation that technical results are checked and cross-checked and triple-checked, unveiling bugs and misconceptions. It is during paper preparation that students are forced to assign physical significance to their results in that dreaded "Discussion" section. It is during paper preparation that students are forced to go back to the literature and understand the background and motivation for their projects in order to write the even more dreaded "Introduction" section.

Beyond a certain point in the project there is almost no point in mulling over the results without a draft: only by writing and editing the discussion of your results can you gain further understanding of them. Writing gives you clarify of thought. This is why one of my colleagues says to his students: "I speak only pdf".

Everybody learns to write papers differently, however, no matter what the learning process is, it all begins with the first draft. In order to constitute meaningful progress, the first draft needs to contain a minimal volume of material. In my experience there is no point in editing the draft until the volume of material is at least 75% of the volume of the final paper. Until the minimal volume is achieved, don't edit sentences -- add sentences.

Don't know what the "final" volume of your paper will be? Check out the best papers that you read recently in your subject. Pick the median paper. Do a page count, a figure count, a word count if that helps. That's your "final paper" size estimate.

Setup for the first draft

Create your paper.tex file from a template. Use one of the standard professional formats (e.g., aastex6) from the very beginning. If you don't already have a template, they are available from the aastex6 website, or any source file for a properly formatted pdf from arXiv will do. Compile your file often. It should always compile without errors or warnings. Fix them as they arise. If you still need a template, a really bare-bones one is available here.

Structure: many of our papers are formulaic, and as there are good reasons for it there is usually no point in re-inventing the wheel and in thinking up something new and clever. Section 1 is "Introduction". Section 2 is either "Observations and data reductions" or "Model setup" (depending on whether your paper is on observations or theory). Section 3 -- "Results of the analysis" or a meaningful title -- will have most of the science plots. Section 4 is "Discussion". Section 5 is "Conclusions". Sometimes for a longer paper you end up having two sections with science plots instead of one, so 6 sections in total. But if you have more than that, you are probably trying to squeeze too much into one paper (or should reorganize your sections).

Place to start: the easiest place to start is usually sections 2 and 3, because that's what you have spent most of your time working on as you are starting to write the paper.

Organization: having created the standard five sections, split them into subsections, give them meaningful titles and throw a few ideas to cover into each. Avoid sub-sub-sections. Introduction should not have subsections.

Figures: a lot of papers are organized around figures. Sometimes it is helpful to make a list of key figures (already available or yet to be made) and arrange them in order of appearance. Figures convey most of the results to most of the readers, so the minimal number to get the point across AND the maximal quality are highly advisable. Figure checklist is given here, although it may be more suitable for one of the later drafts, not the first one.

References and formatting: I find it easiest to start writing papers by assembling and properly formatting the references (bibtex, .bib bibliography file and aastex6 obligatory, adsbibdesk or other library managing software optional). It's boring and tedious, but it keeps me busy and enables me to start thinking about the structure of the paper as I am putting different references I have used in my analysis into different sections that I created previously.

Other necessary and easy to write material includes the description of the tools / data you are using. If you are writing a collaboration paper, write the required 1-2 paragraph description of the survey, put in all the required technical references right away and be done (look at other survey papers; see what information they present; rephrase -- never copy). Put the standard SDSS acknowledgment and be done. If you are writing about simulations and are using somebody else's code, write the required 1-2 paragraph description of the code and the acknowledgments and be done. The descriptions go in section 2, the acknowledgments after the conclusion.

Don't delete any text. I cannot tell you how many times I deleted an unnecessary paragraph only to regret it 5 minutes later and spend half an hour reconstructing it. To avoid this time sink, create another tex file in the same directory (mine are called addons.tex) and move superfluous text into that file. This way, if you need it later, it's there.

Summaries: very few people will read your paper from cover to cover. Abstract is of course known to be critically important as it appears on the arXiv mailings, but the importance of conclusions is often under-appreciated. Do not short-cut the conclusions: your entire workflow and results should be understandable from reading the conclusions alone. Introductions are also very important: write a clear introduction, and beginning graduate students in your field will gratefully cite you for years.

Even though it's sometimes extremely difficult, create drafts of every narrative section right away, including abstract and conclusions. Students often say: "but I don't yet know what the conclusions are going to be, so how can I write anything for the abstract or the conclusions?". You absolutely can and should: unless there is existing text in place, it cannot be edited and therefore no progress will be made on these sections until the first draft is created. The sooner you create the first draft of these sections, the faster the progress on the paper.

So ask yourself: what is your paper about? What is the big science picture? What are the questions you are asking in your paper? The answers to these questions will constitute at least a third of your abstract and your conclusions. So write that. Then think about your process: what did you do to answer these questions? That's another important part of abstract and conclusions. Write that.

The last parts of abstract and conclusions focus on the main results. What do you think are the main results? Even if you don't know a specific conclusion yet, what were you trying to measure? Put down as many of the goals or conclusions as you can. This will later be revised, but until there is text to be revised, no progress can be made, so putting some text down is extremely valuable.

Introduction. Here is a skeleton of a generic five-paragraph introduction: (1) What is a big unresolved question in your field that your paper is relevant for? (2) What is some of the previous relevant work? What did they find? (3) What is some other previous relevant work that might use a different method, or approach a slightly different issue, or disagree with the work above? (4) What are you presenting in this paper? What are some of the hypotheses you are testing with your data? What are some of the ideas you are exploring? (5) What is the structure of this paper? The last paragraph of the introduction is formulaic: it contains the list of the remaining sections and provides some basic notation definitions (e.g., are you using vacuum or air wavelengths? what cosmology are you using? are you using $R$ for cylindrical radius and $r$ for spherical? are you using the particle physics units with c=1? any other definitions that would be useful to introduce in the beginning?).

Let's say you have five references in each of the paragraphs (1), (2) and (3). This will give you a 15-reference first draft of the introduction, which is a very respectable first draft!

Your file will not write itself. Watch your file grow. Set writing goals and deadlines. Until you have a draft, you cannot edit it, and progress on the paper cannot be made. So if you find writing intimidating, what goals can you set to get moving? Can you write two paragraphs right now, into any of the sections? If you write two paragraphs per day for a week, that's about two pages in the journal format. There is no shame in deciding, each day, "What are the easiest two-three paragraphs I can write today?". (This is also my approach in dealing with comments in a referee report: what are the easiest five comments I can address today?)

But how do I explain that? Sometimes if you have a complicated point to convey, you don't know how to explain it. Imagine that you are speaking about your project to a fellow student. In fact, you can recruit a fellow student and try to explain your sticking point to them. Literally voicing your explanation out loud might be all you need to figure out how to get unstuck.

On copying and rephrasing. Exact copying of material from papers by others is plagiarism and is not allowed in any of your writing, unless it is a short direct quote with quotation signs and with a reference to the source. Exact copying of your own material from a previous paper is much less unethical, but still is not considered acceptable. If you are writing many papers on the same subject, an occasional sentence might be accidentally repeated, because there are only so many ways to say some of the basic things. I think it's ok, but repeating large chunks of text is not. Surely you learn something from one paper to the next; so even the introductory material can be rephrased and updated accordingly.

In contrast, I fully embrace exact self-copying between papers and observing and funding proposals, and I am usually happy to give permission to my group members and collaborators to copy-paste from my proposals into their first draft. I think this is ok because proposals are after all drafts and preparation for research, not the actual presentation of the research; because full texts of proposals are not public; and because proposals have not been copyrighted by journals. All the care that went into the writing of the proposal would be a wasted editorial effort if we could not recycle some of that text into the subsequent papers. One must be careful of the potential chain from "published paper 1" into "funding proposal" into "published paper 2" and watch out for accidental repetitions of text in public papers.


The first round of editing

Presenting your first draft. If you are at the stage when you want to show your draft to the advisor, go through your draft one more time and fix as many issues on this checklist as your can.

Consistent active present tense: write the entire paper in the present tense. Search and replace all "will" and "would" verbs. It is ok to say "We obtained the data in 2010", but after you start the analysis, switch from "We reduced the data" to "We reduce the data" and stick to the present tense from there on. Replace passive voice ("It can be seen from the data that") with active voice ("We find that").

Sentences and paragraphs: split long sentences into several sentences. Single-sentence paragraphs are not allowed by the journal. Every sentence must have a subject and a verb. The verb should agree with the subject (plural subjects with plural verbs, "we find"; singular subjects with singular verbs, "the figure shows"). Every paragraph must have a separate point to convey (often most paragraphs will have a mini-intro, mini-narrative and mini-conclusion).

No footnotes: delete them or incorporate them into the text. They are disruptive for the reader. [Somebody recently came in defense of footnotes. This is what I had to say on the matter: "You know what footnotes are like? When you are trying to concentrate on something really difficult and on a deadline, and your three-year-old screams from the bathroom that they need help. It's not relevant, you can't ignore it, and after that you have to spend half an hour figuring out what the heck you were doing before that happened."]

Remove filler words: "We note", "it should be noted", "however", "simply", "clearly", "obviously", "trivially", "also", "it it known that", "as such", "basic", "it's worth noting", "it's worth mentioning", "very", "extremely". There are often too many "therefore"s, "thus"s and "indeed"s. Search for all these words and remove them; usually it can be done without the loss of meaning. Replace qualitative with quantitative evaluations (instead of "a very large fraction" or "much of the data" say "90% of the data").

Remove imperatives and do not assign work to the reader: instead of "compare this with", say "comparison of this result with ours demonstrates that"; instead of "see figure 1 in Smith 1879" say "As demonstrated by Smith 1879, [brief summary of result]".

Parentheses: police reference parentheses (citep, citet and citealt), no nesting parentheses allowed.

Run your paper through a spell checker before showing it to anybody. To add a space after a special LaTeX symbol, put a back-slash, e.g., "at 5\micron\ we find" will produce correct formatting, whereas "at 5\micron we find" will not. Spell out contractions: "we've", "didn't", etc.

Abbreviations: the fewer, the better. Sentences like "The effects of the IR SED and PAHs on the IRAC PSF of the BL AGN may make SMBH mass measurement difficult" should be avoided. During the final editing, the first instance of every abbreviation must be spelled out.

Finer points, AKA Nadia's pet peeves

Structure: I personally don't like forward-referencing ("We discuss this further in section blah") and backward-referencing ("As we show in section blah") within the paper. Of course, an occasional reference is fine, but if you find yourself constantly needing to do this, it may be a sign that the paper is not well organized. For a well-organized paper, the reader would know intuitively where to look for additional details if they need them. I suggest limiting your internal referencing to the final paragraph of the introduction.

Clauses need to agree with the subject. "Having explored this parameter space, the results are shown in Figure 1" is incorrect: the results didn't explore the parameter space, the authors did. One correct way to rephrase this: "We explore the full parameter space and display the results in Figure 1." Then there is one subject ("we") and two verbs that agree with the subject.

Once you notice how often this rule is violated even in "professional" writing (e.g., newspapers) you cannot un-notice it!

Uniformity in like sentence terms. If you go through a comma-separated list of things, make sure they are grammatically the same. "We find that the galaxies in group A cluster strongly, while the galaxies in group B cluster less strongly" -- is ok. Contrast this with "We find that the galaxies in group A cluster strongly and galaxies in group B clustering less strongly."