Guidelines for paper preparation

Much of our job as scientists consists of writing papers. I am not saying this because a strong publication list is good for our careers (though it definitely is). I am saying this because in my experience 90% of the real work on the project happens while "writing the paper".

It is during paper preparation that technical results are checked and cross-checked and triple-checked, unveiling bugs and misconceptions. It is during paper preparation that all the calculations are laid out in the logical order and missing pieces are identified. It is during paper preparation that students are forced to assign physical significance to their results in that dreaded "Discussion" section. It is during paper preparation that students are forced to go back to the literature and understand the background and motivation for their projects in order to write the even more dreaded "Introduction" section.

Beyond a certain point in the project there is almost no point in mulling over the results without a draft: only by writing and editing the discussion of your results can you gain further understanding of them. Writing gives you clarify of thought. This is why one of my colleagues says to his students: "I speak only pdf".

Many a time, I heard from students "I have finished the analysis, I just need to write it up". Sometimes they would even say "I just need a couple of weeks (or a couple of months, or insert another unrealistically short timescale here) to write it up". Let me be completely clear: This never works. I have heard this dozens of times. I have personally been an author of a paper where I know the key result and writing should be straightforward. This approach does not work. It just doesn't. If you have a lot, a lot of writing experience, you might be able to "write the paper quickly" after you have the key result. The problem is that often you don't know what the key result is until you have written the paper.

When is the right time during the project to start writing the paper? Obviously, it is not just a matter of taste but a strong function of the project. However, my experience is that students start writing much later than they should. My opinion is that the time to start writing is when you have your first interesting science plot. Let's say you are done with your main measurements and you have plotted Y vs X and found them to be correlated. You don't know what this means, but it looks interesting.

This is the time to start writing. Why do I think that? As I explain again below, a standard paper consists of an Introduction, Data or Model Description, Results, Discussion and Conclusions. So when you are producing your first results, you are already in the middle of that hypothetical paper! It is the right time to start thinking about the interpretation of your results (which goes into the Discussion), because depending on your ideas for interpretation you will generate new hypotheses to test with your data. In other words, there is a strong feedback loop between the Results and the Discussion, and they both inform each other. At this point if you do not yet have a draft it becomes very hard to keep track of the direction of the Results section.


Everybody learns to write papers differently, however, no matter what the learning process is, it all begins with the first draft. In order to constitute meaningful progress, the first draft needs to contain a minimal volume of material. In my experience there is no point in editing the draft until the volume of material is at least 75% of the volume of the final paper. Until the minimal volume is achieved, don't edit sentences -- add sentences.

Don't know what the "final" volume of your paper will be? Check out the best papers that you read recently in your subject. Pick the median paper. Do a page count, a figure count, a word count if that helps. That's your "final paper" size estimate.

Setup for the first draft

Create your paper.tex file from a template. Use one of the standard professional formats (e.g., aastex6) from the very beginning. If you don't already have a template, they are available from the aastex6 website, or any source file for a properly formatted pdf from arXiv will do. Compile your file often. It should always compile without errors or warnings. Fix them as they arise. If you still need a template, a really bare-bones one is available here. If you need a template for an Overleaf file, go to here, copy all the files, upload them to your newly created Overleaf paper (save anything you have locally before overwriting your files by accident!), make sure it compiles, and hopefully the rest is self-explanatory.

Structure: many of our papers are formulaic, and as there are good reasons for it there is usually no point in re-inventing the wheel and in thinking up something new and clever. Section 1 is "Introduction". Section 2 is either "Observations and data reductions" or "Model setup" (depending on whether your paper is on observations or theory). Section 3 -- "Results of the analysis" or a meaningful title -- will have most of the science plots. Section 4 is "Discussion". Section 5 is "Conclusions". Sometimes for a longer paper you end up having two sections with science plots instead of one, so 6 sections in total. But if you have more than that, you are probably trying to squeeze too much into one paper (or should reorganize your sections).

Place to start: the easiest place to start is usually sections 2 and 3, because that's what you have spent most of your time working on as you are starting to write the paper.

Organization: having created the standard five sections, split them into subsections, give them meaningful titles and throw a few ideas to cover into each. Avoid sub-sub-sections. Introduction should not have subsections.

Figures: a lot of papers are organized around figures. Sometimes it is helpful to make a list of key figures (already available or yet to be made) and arrange them in order of appearance. Figures convey most of the results to most of the readers, so the minimal number to get the point across AND the maximal quality are highly advisable. Figure checklist is given here, although it may be more suitable for one of the later drafts, not the first one.

References and formatting: I find it easiest to start writing papers by assembling and properly formatting the references (bibtex, .bib bibliography file and aastex6 obligatory, adsbibdesk or other library managing software optional). It's boring and tedious, but it keeps me busy and enables me to start thinking about the structure of the paper as I am putting different references I have used in my analysis into different sections that I created previously.

Other necessary and easy to write material includes the description of the tools / data you are using. If you are writing a collaboration paper, write the required 1-2 paragraph description of the survey, put in all the required technical references right away and be done (look at other survey papers; see what information they present; rephrase -- never copy). Put the standard SDSS acknowledgment and be done. If you are writing about simulations and are using somebody else's code, write the required 1-2 paragraph description of the code and the acknowledgments and be done. The descriptions go in section 2, the acknowledgments after the conclusion.

Don't delete any text. I cannot tell you how many times I deleted an unnecessary paragraph only to regret it 5 minutes later and spend half an hour reconstructing it. To avoid this time sink, create another tex file in the same directory (mine are called addons.tex) and move superfluous text into that file. This way, if you need it later, it's there.

Summaries: very few people will read your paper from cover to cover. Abstract is of course known to be critically important as it appears on the arXiv mailings, but the importance of conclusions is often under-appreciated. Do not short-cut the conclusions: your entire workflow and results should be understandable from reading the conclusions alone. Introductions are also very important: write a clear introduction, and beginning graduate students in your field will gratefully cite you for years.

Even though it's sometimes extremely difficult, create drafts of every narrative section right away, including abstract and conclusions. Students often say: "but I don't yet know what the conclusions are going to be, so how can I write anything for the abstract or the conclusions?". You absolutely can and should: unless there is existing text in place, it cannot be edited and therefore no progress will be made on these sections until the first draft is created. The sooner you create the first draft of these sections, the faster the progress on the paper.

So ask yourself: what is your paper about? What is the big science picture? What are the questions you are asking in your paper? The answers to these questions will constitute at least a third of your abstract and your conclusions. So write that. Then think about your process: what did you do to answer these questions? That's another important part of abstract and conclusions. Write that.

The last parts of abstract and conclusions focus on the main results. What do you think are the main results? Even if you don't know a specific conclusion yet, what were you trying to measure? Put down as many of the goals or conclusions as you can. This will later be revised, but until there is text to be revised, no progress can be made, so putting some text down is extremely valuable.

Introduction. Here is a skeleton of a generic five-paragraph introduction: (1) What is a big unresolved question in your field that your paper is relevant for? (2) What is some of the previous relevant work? What did they find? (3) What is some other previous relevant work that might use a different method, or approach a slightly different issue, or disagree with the work above? (4) What are you presenting in this paper? What are some of the hypotheses you are testing with your data? What are some of the ideas you are exploring?

(5) The last paragraph of the introduction is formulaic. What is the structure of this paper? This paragraph contains the list of the remaining sections and provides some basic notation definitions (e.g., are you using vacuum or air wavelengths? what cosmology are you using? are you using $R$ for cylindrical radius and $r$ for spherical? are you using the particle physics units with c=1? any other definitions that would be useful to introduce in the beginning?).

Let's say you have five references in each of the paragraphs (1), (2) and (3). This will give you a 15-reference first draft of the introduction, which is a very respectable first draft!

Your file will not write itself. Watch your file grow. Set writing goals and deadlines. Until you have a draft, you cannot edit it, and progress on the paper cannot be made. So if you find writing intimidating, what goals can you set to get moving? Can you write two paragraphs right now, into any of the sections? If you write two paragraphs per day for a week, that's about two pages in the journal format. There is no shame in deciding, each day, "What are the easiest two-three paragraphs I can write today?". (This is also my approach in dealing with comments in a referee report: what are the easiest five comments I can address today?)

But how do I explain that? Sometimes if you have a complicated point to convey, you don't know how to explain it. Imagine that you are speaking about your project to a fellow student. In fact, you can recruit a fellow student and try to explain your sticking point to them. Literally voicing your explanation out loud might be all you need to figure out how to get unstuck.

On copying and rephrasing. Exact copying of material from papers by others is plagiarism and is not allowed in any of your writing, unless it is a short direct quote with quotation signs and with a reference to the source. Exact copying of your own material from a previous paper is much less unethical, but still is not considered acceptable. If you are writing many papers on the same subject, an occasional sentence might be accidentally repeated, because there are only so many ways to say some of the basic things. I think it's ok, but repeating large chunks of text is not. Surely you learn something from one paper to the next; so even the introductory material can be rephrased and updated accordingly.

In contrast, I fully embrace exact self-copying between papers and observing and funding proposals, and I am usually happy to give permission to my group members and collaborators to copy-paste from my proposals into their first draft. I think this is ok because proposals are after all drafts and preparation for research, not the actual presentation of the research; because full texts of proposals are not public; and because proposals have not been copyrighted by journals. All the care that went into the writing of the proposal would be a wasted editorial effort if we could not recycle some of that text into the subsequent papers. One must be careful of the potential chain from "published paper 1" into "funding proposal" into "published paper 2" and watch out for accidental repetitions of text in public papers.


What if I am completely stuck?

Frequently asked question. I followed all of these suggestions: I have a structure, and some text in every section, but the paper still looks too short and incomplete and not at all like the first draft. What do I do next?

Answer. Look at every section and every subsection individually. What is missing? What are some of the issues that should be addressed in this subsection that are not currently being addressed? Write them down into this section (I like to emphasize these things with a red color).

Next, look at each issue individually. Why hasn't this issue been answered yet? Sometimes it is a straightforward issue you simply haven't yet had a chance to think about. Well, maybe you can spend a few hours and address it, write it up and be done with it. Example: You have thought for a while that you need to re-compute some results from one set of units to another set of units, but you simply haven't had the time to do it. It's straightforward and maybe tedious, but needs to be done, so now is a good time to do it, including the one paragraph of description of the unit transformation.

But sometimes the issue you marked in red is difficult precisely because you have already thought about it many times and still cannot resolve it. In these cases, write one-two paragraphs about your thought process. Why is this issue important? Why does it need to be resolved? What considerations are impacting the resolution of the issue? Which of these considerations are more important, which are less? Example: your plots need error bars. You have already spent a lot of time thinking about the error bars, and you still cannot reach the final decision on which ones to show. Well, write down everything you learned about the sources of error and conclude with a red-marked "We need a final decision on the errors". That would be a great question to bring to your advisor, but after you have laid out all your thoughts on the matter.

Finally, if your draft is failing to grow but still feels incomplete, it's a sign that you may need to do additional analysis / make additional figures. Make a list of all possible directions for additional analysis / additional figures. This is a great list to discuss with the advisor when you bring them the first draft.

The bottom line: organize your thoughts, do everything you can think of doing, and then discuss the remaining issues with the advisor after you present them your first draft.


The first round of editing

Presenting your first draft. If you are at the stage when you want to or need to show your draft to the advisor, go through your draft one more time and fix as many issues on this checklist as your can.

Consistent active present tense: write the entire paper in the present tense. Search and replace all "will" and "would" verbs. It is ok to say "We obtained the data in 2010", but after you start the analysis, switch from "We reduced the data" to "We reduce the data" and stick to the present tense from there on. Replace passive voice ("It can be seen from the data that") with active voice ("We find that").

Sentences and paragraphs: split long sentences into several sentences. Single-sentence paragraphs are not allowed by the journal. Every sentence must have a subject and a verb. The verb should agree with the subject (plural subjects with plural verbs, "we find"; singular subjects with singular verbs, "the figure shows"). Every paragraph must have a separate point to convey (often most paragraphs will have a mini-intro, mini-narrative and mini-conclusion).

No footnotes: delete them or incorporate them into the text. They are disruptive for the reader. [Somebody recently came in defense of footnotes. This is what I had to say on the matter: "You know what footnotes are like? When you are trying to concentrate on something really difficult and on a deadline, and your three-year-old screams from the bathroom that they need help. It's not relevant, you can't ignore it, and after that you have to spend half an hour figuring out what the heck you were doing before that happened."]

Remove filler words: "We note", "it should be noted", "however", "simply", "clearly", "obviously", "trivially", "also", "it it known that", "as such", "basic", "it's worth noting", "it's worth mentioning", "very", "extremely", "completely". There are often too many "therefore"s, "thus"s and "indeed"s. Search for all these words and remove them; usually it can be done without the loss of meaning. Replace qualitative with quantitative evaluations (instead of "a very large fraction" or "much of the data" say "90% of the data").

Remove imperatives and do not assign work to the reader: instead of "compare this with", say "comparison of this result with ours demonstrates that"; instead of "see figure 1 in Smith 1879" say "As demonstrated by Smith 1879, [brief summary of result]".

Parentheses: police reference parentheses (citep, citet and citealt), no nesting parentheses allowed.

Run your paper through a spell checker before showing it to anybody. To add a space after a special LaTeX symbol, put a back-slash, e.g., "at 5\micron\ we find" will produce correct formatting, whereas "at 5\micron we find" will not. Spell out contractions: "we've", "didn't", etc.

Abbreviations: the fewer, the better. Sentences like "The effects of the IR SED and PAHs on the IRAC PSF of the BL AGN may make SMBH mass measurement difficult" should be avoided. During the final editing, the first instance of every abbreviation must be spelled out.

Finer points, AKA Nadia's pet peeves

Structure: I personally don't like forward-referencing ("We discuss this further in section blah") and backward-referencing ("As we showed in section blah") within the paper. Of course, an occasional reference is fine, but if you find yourself constantly needing to do this, it may be a sign that the paper is not well organized. For a well-organized paper, the reader would know intuitively where to look for additional details if they need them. I suggest limiting your internal referencing to the final paragraph of the introduction.

Clauses need to agree with the subject. "Having explored this parameter space, the results are shown in Figure 1" is incorrect: the results didn't explore the parameter space, the authors did. One correct way to rephrase this: "We explore the full parameter space and display the results in Figure 1." Then there is one subject ("we") and two verbs that agree with the subject.

Once you notice how often this rule is violated even in "professional" writing (e.g., newspapers) you cannot un-notice it!

Uniformity in like sentence terms. If you go through a comma-separated list of things, make sure they are grammatically the same. "We find that the galaxies in group A cluster strongly, while the galaxies in group B cluster less strongly" -- is ok. Contrast this with "We find that the galaxies in group A cluster strongly and galaxies in group B clustering less strongly."

Hyphenation. Many self-respecting languages have strict rules about this, but in English apparently this is a matter of taste. Here is a relatively simple rule I tend to follow. If a two-word structure (where one of the words is not an adjective) is used as an adjective, I hyphenate, but if the same structure is used as a noun, I don't. Example: "the large-scale structure of the universe" but "the universe is uniform on large scales".


Other expert opinions

A very short writing guide from Australian astrophysicist Michael J.I. Brown:
Impose structure.
Define the purpose of each paragraph (use comments).
State the "why" before the "how."
Use the figure captions to convey key science points.
Quantify.
Clarify.
Proof-read out loud.

"Ten simple rules for structuring papers", Mensh and Kording
PDF version downloaded from here.