• Stack Overflow for Teams Where developers & technologists share private knowledge with coworkers
  • Advertising & Talent Reach devs & technologists worldwide about your product, service or employer brand
  • OverflowAI GenAI features for Teams
  • OverflowAPI Train & fine-tune LLMs
  • Labs The future of collective knowledge sharing
  • About the company Visit the blog

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Get early access and see previews of new features.

Using Git for writing thesis [closed]

I am planning to use Git for writing my thesis with Latex. As Git is specifically designed for software development, would it be feasible for my requirements? If it is a good choice for me, then what special and unique features are available in Git which are ideal for writing a thesis. Also I want to know what precautions I should take before getting into the Git work flow. I am a complete beginner for Git, so what should be my starting point before I get into it.

  • version-control

Robert Harvey's user avatar

  • 1 Git is a very good revision control system, it (and stuff like mercurial, svn) isn't strictly for use with software development. Since you're using Latex, which is textual, git will be useful if you want to keep revisions of your thesis, and then compare revisions or get back an old revision. There are a lot of really cool features in Git, though I think a lot of the more advanced functionality will not really apply to you (like git-bisect ), but the version management is up your alley. Here's a tutorial: schacon.github.com/git/gittutorial.html –  wkl Commented Oct 15, 2011 at 5:11
  • 23 The Fork functionality will be helpful if you acquire multiple personality disorder –  Zelda Commented Oct 15, 2011 at 5:13
  • 1 The automatic document version control in Mac OS X Lion might be worth a look if that is an option for you. –  titaniumdecoy Commented Oct 15, 2011 at 5:22
  • Git is great but it will take you some time to learn it, find tools that suit your needs, understand the philosophy, etc. While being an avid git user, I would recommend Mercurial for this task, with the same features and easier to install/learn –  CharlesB Commented Oct 15, 2011 at 10:45

5 Answers 5

There are some technical considerations and best practices. I am going for the second one, specifically for writing your thesis and/or papers. For the technical ones, you can check any git tutorial.

Define the directory structure for your thesis. You can change it later, and use git for tracking the changes. Having a good structure would make your life easier.

Work with multiple files (use include and/or input in LaTeX). You can split them by chapters or sections. This will make easier to track changes that involve specific parts of your thesis (e.g. git log content/introduction.tex ).

Track only the files you are going to touch, not the ones auto-generated. Creating a proper .gitignore file will help you a lot (LaTeX generate plenty of working files).

As in programs, do micro-commits, that is: one commit per idea/feature/fix/activity.

Every time you commit, write meaningful messages (high level) that explains what you were trying to achieve in every change. After a week you might not remember what you tried to accomplish.

Keeping track of every activity/idea/fix [see (4) and (5)] could be very helpful to know how much you have done (using git log ). You can write your advance report for your supervisor(s) based on git log . Even more, you can share the repository with your supervisor (using a web interface), and they can check whatever you have been doing in your thesis. For the next meeting, they will know what to expect (it will depend on how fond are your supervisors on following a RSS).

Using git will be useful for keeping you in a good mood (sometimes you would feel you have not done too much, but having track of every change will help you to keep things in perspective).

For every progress report you send, create a tag. For the next report, you can checkout both version and apply latexdiff . It will be useful for tracking changes between versions you submit for revision. This also will help you to check if you addressed the feedback you received for the previous report.

At last but not least, I recommend you to read " A successful Git branching model ". It is a very short article on a git workflow. You can apply the same concepts when you write your thesis. For instance, if you are writing an experiment, you can create a branch for it, and merge it once it is " ready ." If you have to revisit it later, it would be easier to see what were the changes involved and why.

gpoo's user avatar

When I was writing my PhD thesis,¹ I used git to manage the document and all its figures, and I'm very glad that I did so, not least because it makes it easy to write a script that graphs your progresss as you're going along ;) The chief advantages I found were:

  • Since git is a distributed version control system, it's easy to work on multiple machines. If you need the latest version from your laptop on your desktop machine, you can just pull directly from the laptop and work there. When you leave, you go to your laptop and pull from the desktop machine.
  • If you work on multiple machines, you effectively have a recent backup of your work (including its complete history), and if you want to create further backups you can just push to a new bare repository elsewhere (as VonC's answer points out).
  • You can make large changes to your document knowing that the previous version is securely stored, and that if you want to retrieve the old version, that's easy to do.
  • Being able to commit to your repository when you're offline is very useful, particularly since not having internet access makes it much easier to write ;) I also kept PDFs of all the papers I cited in the same repository to make it easier to work offline, although this vastly inflated the repository, so some might advice against that.

The chief advice that I'd give:

  • Commit frequently, and always make sure that you keep the output of git status empty, either by adding files you need, or listing them in .gitignore . You don't want to risk having important files untracked.
  • Never use history rewriting commands (e.g. git rebase ), just to be safe and never use git's dangerous commands like git reset --hard and git checkout -f . No one will ever see your complete repository, so you don't care what the history looks like - it's much more important that you don't do anything that might lose (or make it more difficult to retrieve) your work.
  • When you're looking at differences between your versions, use the --color-words option to git diff . Otherwise, your diffs will be line-based, and if you reformat a paragraph in LaTeX, it'll be hard to see what the real changes are - git diff --color-words ignores the line-breaks, and just shows the old words in red and the new words in green.

¹ ... with LyX rather than directly in LaTeX, but the issues are essentially the same.

Community's user avatar

  • The progress graph is pretty awesome, thanks for sharing! I am trying to do something similar but I want to track the size (byte or kb whichever) of the commits instead of the Lines in Document . I know git log --log-size shows the size of the commit message, but not sure how to see the size of the whole commit (including files). Any idea how to do that? thanks again! –  Aziz Alto Commented Apr 22, 2015 at 19:57
  • Ok, it seems git doesn't do that! the closest way I found is this: stackoverflow.com/questions/10845051/… –  Aziz Alto Commented Apr 22, 2015 at 20:10
  • Could you please share how you got that plot for the number f lines? Is it an easy script to write? What do I need to know to be able to write one myself? –  Fazzolini Commented Jan 8, 2016 at 16:18

This is mainly just meant as a comment, but it turned out a bit too long, so I am posting it as an answer.

I used darcs for my Master's thesis, and have been using RCS, CVS, SVN, and Git for lots of documentation / writing projects in the past. All of these tools provide the basic feature I want -- ability to review my changes, go back in history, check in "undo points" when I start writing something new.

There are old and tried recommendations for writing documentation with version control. Using a text-only source format is important for getting sane diffs. In addition, a useful tip I picked up (IIRC from Kernighan, writing about keeping Troff source in version control) is to make sure all lines are reasonably short. I tend to whack enter every few lines, with an eye towards keeping one particular clause or idiom on one line, so that the diff will be minimal if I decide to revise that particular detail later.

tripleee's user avatar

Git will work. Latex is effectively source code, so it should be perfectly fine.

That said,Git, while awesome, has as slightly steep learning curve because it allows for a lot of things for collaborating with multiple people, handling diverging histories,etc. Its really big advantage is in merging conflicts ( what happens if I change a file and someone else changes a file and we both try to upload/commit it to some server?).

If you just want to version your thesis, you are unlikely to even hit the conflicting merge case (since you are the only one editing it), let alone the multiple histories case. I'd use something simpler like SVN, which while worse for doing the two things I described, fits your needs and is easier to learn.

Also, git stores everything in a .git file in the folder you are in. If you delete that folder , your data is gone.

imichaelmiers's user avatar

  • 5 SVN requires a server, git can be handled entirely on the user's machine. If he is not operating with multiple users, he will never run into the elements which make git difficult in your eyes. From a complete novice's perspective, I think git would actually be quite a bit easier. –  Zack Bloom Commented Oct 15, 2011 at 5:58
  • file:// protocol can be handled without "server". svn:// protocol is easy implementable thing with only subversion distro. Git never can be considered as "bit easy". You made mistakes in ALL statements –  Lazy Badger Commented Oct 15, 2011 at 8:17
  • 3 I think this answer misses a number of key advantages with using a DVCS for writing documents, in particular having complete history of your repository locally, easily pushing and pulling between machines you're working on, disconnected operation, etc. etc. Recommending using SVN for this kind of task in a world where git and Mercurial exist is a bad idea, I think. –  Mark Longair Commented Oct 15, 2011 at 9:48

In a DVCS , a " workflow " means:

  • merge workflow (which you shouldn't need that much in your case)
  • publication workflow (push to a remote repo)

With your local .git repo, you will be able to compare with previous versions (which can come in handy) But the benefit of a DVCS is when:

  • you save your work through a push to a remote repo (or, for backup purposes, a bundle )
  • you synchronize your work between two different PC (like in " How to push a local git repository to another computer? " or in " git server between laptop and PC (MS Windows 7) "). Then, once the sync is done (through a git push ), you can take your second environment completely off-line, and still benefit from the full history of your repo. That is where a DVCS matters in your case.

VonC's user avatar

Not the answer you're looking for? Browse other questions tagged git version-control or ask your own question .

  • The Overflow Blog
  • Scaling systems to manage all the metadata ABOUT the data
  • Navigating cities of code with Norris Numbers
  • Featured on Meta
  • We've made changes to our Terms of Service & Privacy Policy - July 2024
  • Bringing clarity to status tag usage on meta sites
  • Feedback requested: How do you use tag hover descriptions for curating and do...

Hot Network Questions

  • Has anybody replaced a LM723 for a ua723 and experienced problems with drift and oscillations
  • Why does the definition of a braided monoidal category not mention the braid equation?
  • Is Cohort level hard capped?
  • Did the Space Shuttle weigh itself before deorbit?
  • How would a culture living in an extremely vertical environment deal with dead bodies?
  • How to handle stealth before combat starts?
  • Polar coordinate plot incorrectly plotting with PGF Plots
  • Is my encryption format secure?
  • Unexpected behaviour during implicit conversion in C
  • Referencing colored item
  • What is the connection between a regular language's pumping number, and the number of states of an equivalent deterministic automaton?
  • Why is Bangladesh considered significantly more corrupt than India and Pakistan by the World Bank/Brookings WGI?
  • Dial “M” for murder
  • Age is just a number!
  • What majority age is taken into consideration when travelling from country to country?
  • Will the US Customs be suspicious of my luggage if i bought a lot of the same item?
  • Discrete cops and robbers
  • Terminal autocomplete (tab) not completing when changing directory up one level (cd ../)
  • Existence of semi-group with local but not global identity
  • MOSFETs keep shorting way below rated current
  • Linear Algebra Done Right, 4th Edition, problem 7.D.11
  • How predictable are the voting records of members of the US legislative branch?
  • Are there jurisdictions where an uninvolved party can appeal a court decision?
  • Why was I was allowed to bring 1.5 liters of liquid through security at Frankfurt Airport?

write thesis git

How to Git your PhD thesis on GitHub

20 Jun 2020 - fubar - Sreekar Guddeti

We will work with GitHub’s repository hosting service to have a remote repository for our PhD thesis

Git is a version control system usually used for software. However it can also be used for versioning any document set. We will see how git can be used to version a PhD thesis. Versioning a PhD thesis is not only useful as a backup option but also can give an overview of how the thesis gets shaped over the course of time. Also since thesis writing is a highly non linear phenomenon, git provides tools to track the non linear development. We will work with GitHub’s repository hosting service to have a remote repository for our PhD thesis. It is a good idea to git your thesis as a private repository . GitHub private repository provides unlimited storage as long as the file sizes do not exceed 100MB size limit .

The following are the steps to have a Git repository for the PhD thesis

  • Setup Git on linux

Setup a private repository on GitHub

Clone the remote repository.

  • Quick walkthrough
  • Detailed description

General references on Git , The Git Parable , Add an existing repository to GitHub ,  LaTeX template

Add a .gitignore file that tells git to exclude a set of files from version control. Set the .gitignore file type to TeX . Create a private repository.

private-repository-on-GitHub

Now that a remote repository is created on GitHub, we need to clone it on a local computer to work with it. Get the URL of the remote repository and use git clone <url to remote repo>

Basic Snapshotting Quick walkthrough

Now that we have a local copy of the remote repository myTestThesis to work with, we will describe a typical workflow for basic snapshotting involving

  • Creating/modifying files of the repository - nano newfile.tex
  • staging files to git tracking system - git add *
  • committing files to commit history - git commit -m "<my message>"
  • pushing the commits to the remote repo - git push origin master
  • go to step 1

Basic Snapshotting Detailed description

Creating/modifying files of the repository.

The local copy of the repository is created in myTestThesis . Now change into the directory and list the files including the hidden files

The hidden folder .git holds all the information about the commit history of the repository. The hidden file .gitignore contains instructions to exclude files from getting versioned. A preliminary scan of the contents of .gitignore shows what kind of files are omitted, usually the .aux (auxiliary), .log , etc files generated during a typical LaTeX compilation. We do not want to version them as they are not part of the source code, they are output of compilation.

Copy the contents of the LaTeX template to the local repository myTestThesis .

As new files are added to the local repository myTestThesis , they hold the status of untracked files . The status of every file is given the git status command

The  generated pdf files after LaTeX compilation also need to be excluded from tracking as pdf files are binary files. Binary files are not suitable for gitting . Included generated pdf files can bloat the repo.

To avoid this modify the .gitignore file to exclude specifically the pdf file Change the line

to the following line

After this modification, if you check the status of the repository again using git status , you will observe that the main pdf file is ignored from tracking, however the .gitignore file shows up as modified . This is because the .gitignore file was already being tracked by git as we cloned it from the remote repository. Now that we modified it, the status of the .gitignore file has changed to modified.

Note that git is aware of other pdf files like IISc_Logo.pdf . This is intended as some of our figures are .pdf files and they are not likely to change. If you want to completely all pdf files, add *.pdf line in the .gitignore file.

Staging the files to git tracking system

The untracked and modified files need to be staged. To stage the files use git add <filename> for each file or git add * for staging all the files at one go.

Check the status of the repository and see if the files are staged

You will see that .gitignore is not staged for commit. This is because git add command does not add hidden files. You need to manually add the file

Committing the files to commit history

Now that all the files are staged , they can be committed to the commit history using git commit -m "<A message>" . A message is needed to summarize what the commit is all about

Check the log of the commit history

git log gives the summary of the commit history. To check if the commit has been logged into the commit history. The log is a list of 7 character hexadecimal SHA strings (the actual length is 40 character but with --oneline a shortened version is printed) and the commit message

Push the commit to the remote repository

The above commit has updated the commit history of the local repository. However we need to update the commit history of the remote repository as well. This is called pushing the commit to the remote repository. The remote repository is called the origin and the branch on which the commits are adding up is the master branch. To push a commit, the branches on the local and remote need to be same. To push the commit, we use git push command

You can check  if the commit has been pushed to the remote repository either by running git status at the local repository or checking at GitHub if the files have been committed

Iterate over the workflow

Repeat the workflow when you want to add content to the remote repository

In summary we have used many git commands in the process of git workflow to write the PhD thesis. Let us list the commands

In addition to the above mentioned git commands, some other commands are also useful like git diff , git rm . More on these at Git’s basic snapshotting commands

In the current article, we assumed .Tex source files for our thesis. However there are other workflows like using Scientific Markdown and Git to write the PhD thesis. Git + Scientific Markdown workflow for PhD thesis

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • PLoS Comput Biol
  • v.12(1); 2016 Jan

Logo of ploscomp

A Quick Introduction to Version Control with Git and GitHub

John d. blischak.

1 Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, Illinois, United States of America

Emily R. Davenport

2 Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America

Greg Wilson

3 Software Carpentry Foundation, Toronto, Ontario, Canada

Associated Data

“This is part of the PLOS Computational Biology Education collection.”

Introduction to Version Control

Many scientists write code as part of their research. Just as experiments are logged in laboratory notebooks, it is important to document the code you use for analysis. However, a few key problems can arise when iteratively developing code that make it difficult to document and track which code version was used to create each result. First, you often need to experiment with new ideas, such as adding new features to a script or increasing the speed of a slow step, but you do not want to risk breaking the currently working code. One often-utilized solution is to make a copy of the script before making new edits. However, this can quickly become a problem because it clutters your file system with uninformative filenames, e.g., analysis.sh, analysis_02.sh, analysis_03.sh, etc. It is difficult to remember the differences between the versions of the files and, more importantly, which version you used to produce specific results, especially if you return to the code months later. Second, you will likely share your code with multiple lab mates or collaborators, and they may have suggestions on how to improve it. If you email the code to multiple people, you will have to manually incorporate all the changes each of them sends.

Fortunately, software engineers have already developed software to manage these issues: version control. A version control system (VCS) allows you to track the iterative changes you make to your code. Thus, you can experiment with new ideas but always have the option to revert to a specific past version of the code you used to generate particular results. Furthermore, you can record messages as you save each successive version so that you (or anyone else) reviewing the development history of the code is able to understand the rationale for the given edits. It also facilitates collaboration. Using a VCS, your collaborators can make and save changes to the code, and you can automatically incorporate these changes to the main code base. The collaborative aspect is enhanced with the emergence of websites that host version-controlled code.

In this quick guide, we introduce you to one VCS, Git ( https://git-scm.com ), and one online hosting site, GitHub ( https://github.com ), both of which are currently popular among scientists and programmers in general. More importantly, we hope to convince you that although mastering a given VCS takes time, you can already achieve great benefits by getting started using a few simple commands. Furthermore, not only does using a VCS solve many common problems when writing code, it can also improve the scientific process. By tracking your code development with a VCS and hosting it online, you are performing science that is more transparent, reproducible, and open to collaboration [ 1 , 2 ]. There is no reason this framework needs to be limited only to code; a VCS is well-suited for tracking any plain-text files: manuscripts, electronic lab notebooks, protocols, etc.

Version Your Code

The first step is to learn how to version your own code. In this tutorial, we will run Git from the command line of the Unix shell. Thus, we expect readers are already comfortable with navigating a filesystem and running basic commands in such an environment. You can find directions for installing Git for the operating system running on your computer by following one of the links provided in Table 1 . There are many graphical user interfaces (GUIs) available for running Git ( Table 1 ), which we encourage you to explore, but learning to use Git on the command line is necessary for performing more advanced operations and using Git on a remote machine.

ResourceOptions
Distributed VCSGit ( )
Mercurial ( )
Bazaar ( )
Online hosting siteGitHub ( )
Bitbucket ( )
GitLab ( )
Source Forge ( )
Git installation
Git tutorialsSoftware Carpentry ( )
Pro Git ( )
A Visual Git Reference ( )
tryGit ( )
Graphical User Interface for Git

To follow along, first create a folder in your home directory named thesis . Next, download the three files provided in Supporting Information and place them in the thesis directory. Imagine that, as part of your thesis, you are studying the transcription factor CTCF, and you want to identify high-confidence binding sites in kidney epithelial cells. To do this, you will utilize publicly available ChIP-seq data produced by the ENCODE consortium [ 3 ]. ChIP-seq is a method for finding the sites in the genome where a transcription factor is bound, and these sites are referred to as peaks [ 4 ]. process.sh downloads the ENCODE CTCF ChIP-seq data from multiple types of kidney samples and calls peaks ( S1 Data ); clean.py filters peaks with a fold change cutoff and merges peaks from the different kidney samples ( S2 Data ); and analyze.R creates diagnostic plots on the length of the peaks and their distribution across the genome ( S3 Data ).

If you have just installed Git, the first thing you need to do is provide some information about yourself, since it records who makes each change to the file(s). Set your name and email by running the following lines, but replacing “First Last” and “user@domain” with your full name and email address, respectively.

$ git config --global user.name "First Last"

$ git config --global user.email "user@domain"

To start versioning your code with Git, navigate to your newly created directory, ~/thesis . Run the command git init to initialize the current folder as a Git repository (Figs ​ (Figs1 1 and ​ and2A). 2A ). A repository (or repo, for short) refers to the current version of the tracked files as well as all the previously saved versions ( Box 1 ). Only files that are located within this directory (and any subdirectories) have the potential to be version controlled, i.e., Git ignores all files outside of the initialized directory. For this reason, projects under version control tend to be stored within a single directory to correspond with a single Git repository. For strategies on how to best organize your own projects, see Noble, 2009 [ 5 ].

An external file that holds a picture, illustration, etc.
Object name is pcbi.1004668.g001.jpg

To store a snapshot of changes in your repository, first git add any files to the staging area you wish to commit (for example, you’ve updated the process.sh file). Second, type git commit with a message. Only files added to the staging area will be committed. All past commits are located in the hidden .git directory in your repository.

An external file that holds a picture, illustration, etc.
Object name is pcbi.1004668.g002.jpg

(A) To designate a directory on your computer as a Git repo, type the command git init . This initializes the repository and will allow you to track the files located within that directory. (B) Once you have added a file, follow the git add/commit cycle to place the new file first into the staging area by typing git add to designate it to be committed, and then git commit to take the shapshot of that file. The commit is assigned a commit identifier (d75es) that can be used in the future to pull up this version or to compare different committed versions of this file. (C) As you continue to add and change files, you should regularly add and commit those changes. Here, an additional commit was done, and the commit log now shows two commit identifiers: d75es (from step B) and f658t (the new commit). Each commit will generate a unique identifier, which can be examined in reverse chronological order using git log .

Box 1. Definitions

  • Version Control System (VCS) : (noun) a program that tracks changes to specified files over time and maintains a library of all past versions of those files
  • Git : (noun) a version control system
  • repository (repo) : (noun) folder containing all tracked files as well as the version control history
  • commit : (noun) a snapshot of changes made to the staged file(s); (verb) to save a snapshot of changes made to the staged file(s)
  • stage : (noun) the staging area holds the files to be included in the next commit; (verb) to mark a file to be included in the next commit
  • track : (noun) a tracked file is one that is recognized by the Git repository

Box 7. Branching

Do you ever make changes to your code, but are not sure you will want to keep those changes for your final analysis? Or do you need to implement new features while still providing a stable version of the code for others to use? Using Git, you can maintain parallel versions of your code that you can easily bounce between while you are working on your changes. You can think of it like making a copy of the folder you keep your scripts in, so that you have your original scripts intact but also have the new folder where you make changes. Using Git, this is called branching, and it is better than separate folders because (1) it uses a fraction of the space on your computer, (2) it keeps a record of when you made the parallel copy (branch) and what you have done on the branch, and (3) there is a way to incorporate those changes back into your main code if you decide to keep your changes (and a way to deal with conflicts). By default, your repository will start with one branch, usually called “master.” To create a new branch in your repository, type git branch new_branch_name . You can see what branches a current repository has by typing git branch , with the branch you are currently in being marked by a star. To move between branches, type git checkout branch_to_move_to . You can edit files and commit them on each branch separately. If you want to combine the changes in your new branch with the master branch, you can merge the branches by typing git merge new_branch_name while in the master branch.

  • local : (noun) the version of your repository that is stored on your personal computer
  • remote : (noun) the version of your repository that is stored on a remote server; for instance, on GitHub
  • clone : (verb) to create a local copy of a remote repository on your personal computer
  • fork : (noun) a copy of another user’s repository on GitHub; (verb) to copy a repository; for instance, from one user’s GitHub account to your own
  • merge : (verb) to update files by incorporating the changes introduced in new commits
  • pull : (verb) to retrieve commits from a remote repository and merge them into a local repository
  • push : (verb) to send commits from a local repository to a remote repository
  • pull request : (noun) a message sent by one GitHub user to merge the commits in their remote repository into another user’s remote repository

$ cd ~/thesis

analyze.R clean.py process.sh

Initialized empty Git repository in ~/thesis/.git/

Now you are ready to start versioning your code ( Fig 1 ). Conceptually, Git saves snapshots of the changes you make to your files whenever you instruct it to. For instance, after you edit a script in your text editor, you save the updated script to your thesis folder. If you tell Git to save a shapshot of the updated document, then you will have a permanent record of the file in that exact version even if you make subsequent edits to the file. In the Git framework, any changes you have made to a script but have not yet recorded as a snapshot with Git reside in the working directory only ( Fig 1 ). To follow what Git is doing as you record the initial version of your files, use the informative command git status.

$ git status

On branch master

Initial commit

Untracked files:

    (use "git add <file>…" to include in what will be committed)

        analyze.R

        clean.py

        process.sh

nothing added to commit but untracked files present (use "git add" to track)

There are a few key things to notice from this output. First, the three scripts are recognized as untracked files because you have not told Git to start tracking anything yet. Second, the word “commit” is Git terminology for a snapshot. As a noun, it means “a version of the code,” e.g., “the figure was generated using the commit from yesterday” ( Box 1 ). This word can also be used as a verb, meaning “to save,” e.g., “to commit a change.” Lastly, the output explains how you can track your files using git add . Start tracking the file process.sh.

$ git add process.sh

And check its new status.

Changes to be committed:

    (use "git rm --cached <file>…" to unstage)

        new file: process.sh

Since this is the first time that you have told Git about the file process.sh , two key things have happened. First, this file is now being tracked, which means Git recognizes it as a file you wish to be version controlled ( Box 1 ). Second, the changes made to the file (in this case the entire file, because it is the first commit) have been added to the staging area ( Fig 1 ). Adding a file to the staging area will result in the changes to that file being included in the next commit, or snapshot, of the code ( Box 1 ). As an analogy, adding files to the staging area is like putting things in a box to mail off, and committing is like putting the box in the mail.

Since this will be the first commit, or first version, of the code, use git add to begin tracking the other two files and add their changes to the staging area as well. Then create the first commit using the command git commit .

$ git add clean.py analyze.R

$ git commit -m "Add initial version of thesis code."

[master (root-commit) 660213b] Add initial version of thesis code.

3 files changed, 154 insertions(+)

create mode 100644 analyze.R

create mode 100644 clean.py

create mode 100644 process.sh

Notice the flag -m was used to pass a message for the commit. This message describes the changes that have been made to the code and is required. If you do not pass a message at the command line, the default text editor for your system will open so you can enter the message. You have just performed the typical development cycle with Git: make some changes, add updated files to the staging area, and commit the changes as a snapshot once you are satisfied with them ( Fig 2 ).

Since Git records all of the commits, you can always look through the complete history of a project. To view the record of your commits, use the command git log . For each commit, it lists the unique identifier for that revision, author, date, and commit message.

commit 660213b91af167d992885e45ab19f585f02d4661

Author: First Last <user@domain>

Date: Fri Aug 21 14:52:05 2015–0500

    Add initial version of thesis code.

The commit identifier can be used to compare two different versions of a file, restore a file to a previous version from a past commit, and even retrieve tracked files if you accidentally delete them.

Now you are free to make changes to the files knowing that you can always revert them to the state of this commit by referencing its identifier. As an example, edit clean.py so that the fold change cutoff for filtering peaks is more stringent. Here is the current bottom of the file.

$ tail clean.py

# Filter based on fold-change over control sample

fc_cutoff = 10

epithelial = epithelial.filter(filter_fold_change, fc = fc_cutoff).saveas()

proximal_tube = proximal_tube.filter(filter_fold_change, fc = fc_cutoff).saveas()

kidney = kidney.filter(filter_fold_change, fc = fc_cutoff).saveas()

# Identify only those sites that are peaks in all three tissue types

combined = pybedtools.BedTool().multi_intersect(

    i = [epithelial.fn, proximal_tube.fn, kidney.fn])

union = combined.filter(lambda x: int(x[3]) = = 3).saveas()

union.cut(range(3)).saveas(data + "/sites-union.bed")

Using a text editor, increase the fold change cutoff from 10 to 20.

fc_cutoff = 20

Because Git is tracking clean.py , it recognizes that the file has been changed since the last commit.

# On branch master

# Changes not staged for commit:

#    (use "git add <file>…" to update what will be committed)

#    (use "git checkout --<file>…" to discard changes in working directory)

#    modified: clean.py

no changes added to commit (use "git add" and/or "git commit -a")

The report from git status indicates that the changes to clean.py are not staged, i.e., they are in the working directory ( Fig 1 ). To view the unstaged changes, run the command git diff .

diff --git a/clean.py b/clean.py

index 7b8c058.76d84ce 100644

--- a/clean.py

+++ b/clean.py

@@ -28,7 +28,7 @@ def filter_fold_change(feature, fc = 1):

    return False

-fc_cutoff = 10

+fc_cutoff = 20

Any lines of text that have been added to the script are indicated with a +, and any lines that have been removed with a -. Here, we altered the line of code that sets the value of fc_cutoff. git diff displays this change as the previous line being removed and a new line being added with our update incorporated. You can ignore the first five lines of output, because they are directions for other software programs that can merge changes to files. If you wanted to keep this edit, you could add clean.py to the staging area using git add and then commit the change using git commit , as you did above. Instead, this time undo the edit by following the directions from the output of git status to “discard changes in the working directory” using the command git checkout .

$ git checkout -- clean.py

Now git diff returns no output, because git checkout undid the unstaged edit you had made to clean.py . This ability to undo past edits to a file is not limited to unstaged changes in the working directory. If you had committed multiple changes to the file clean.py and then decided you wanted the original version from the initial commit, you could replace the argument -- with the commit identifier of the first commit you made above (your commit identifier will be different; use git log to find it). The -- used above was simply a placeholder for the first argument because, by default, git checkout restores the most recent version of the file from the staging area (if you haven’t staged any changes to this file, as is the case here, the version of the file in the staging area is identical to the version in the last commit). Instead of using the entire commit identifier, use only the first seven characters, which is simply a convention, since this is usually long enough for it to be unique.

$ git checkout 660213b clean.py

At this point, you have learned the commands needed to version your code with Git. Thus, you already have the benefits of being able to make edits to files without copying them first, to create a record of your changes with accompanying messages, and to revert to previous versions of the files if needed. Now you will always be able to recreate past results that were generated with previous versions of the code (see the command git tag for a method to facilitate finding specific past versions) and see the exact changes you have made over the course of a project.

Share Your Code

Once you have your files saved in a Git repository, you can share it with your collaborators and the wider scientific community by putting your code online ( Fig 3 ). This also has the added benefit of creating a backup of your scripts and provides a mechanism for transferring your files across multiple computers. Sharing a repository is made easier if you use one of the many online services that host Git repositories ( Table 1 ), e.g., GitHub. Note, however, that any files that have not been tracked with at least one commit are not included in the Git repository, even if they are located within the same directory on your local computer (see Box 2 for advice on the types of files that should not be versioned with Git and Box 3 for advice on managing large files).

An external file that holds a picture, illustration, etc.
Object name is pcbi.1004668.g003.jpg

(A) On your computer, you commit to a Git repository (commit d75es). (B) On GitHub, you create a new repository called thesis. This repository is currently empty and not linked to the repo on your local machine. (C) The command git remote add connects your local repository to your remote repository. The remote repository is still empty, however, because you have not pushed any content to it. (D) You send all the local commits to the remote repository using the command git push . Only files that have been committed will appear in the remote repository. (E) You repeat several more rounds of updating scripts and committing on your local computer (commit f658t and then commit xv871). You have not yet pushed these commits to the remote repository, so only the previously pushed commit is in the remote repo (commit d75es). (F) To bring the remote repository up to date with your local repository, you git push the two new commits to the remote repository. The local and remote repositories now contain the same files and commit histories.

Box 2. What Not to Version Control

You can version control any file that you put in a Git repository, whether it is text-based, an image, or a giant data file. However, just because you can version control something, does not mean you should . Git works best for plain, text-based documents such as your scripts or your manuscript if written in LaTeX or Markdown. This is because for text files, Git saves the entire file only the first time you commit it and then saves just your changes with each commit. This takes up very little space, and Git has the capability to compare between versions (using git diff ). You can commit a non-text file, but a full copy of the file will be saved in each commit that modifies it. Over time, you may find the size of your repository growing very quickly. A good rule of thumb is to version control anything text-based: your scripts or manuscripts if they are written in plain text. Things not to version control are large data files that never change, binary files (including Word and Excel documents), and the output of your code.

In addition to the type of file, you need to consider the content of the file. If you plan on sharing your commits publicly using GitHub, ensure you are not committing any files that contain sensitive information, such as human subject data or passwords.

To prevent accidentally committing files you do not wish to track, and to remove them from the output of git status , you can create a file called .gitignore . In this file, you can list subdirectories and/or file patterns that Git should ignore. For example, if your code produced log files with the file extension .log , you could instruct Git to ignore these files by adding *.log to .gitignore . In order for these settings to be applied to all instances of the repository, e.g., if you clone it onto another computer, you need to add and commit this file.

Box 3. Managing Large Files

Many biological applications require handling large data files. While Git is best suited for collaboratively writing small text files, nonetheless, collaboratively working on projects in the biological sciences necessitates managing this data.

The example analysis pipeline in this tutorial starts by downloading data files in BAM format that contain the alignments of short reads from a ChIP-seq experiment to the human genome. Since these large, binary files are not going to change, there is no reason to version them with Git. Thus, hosting them on a remote http (as ENCODE has done in this case) or ftp site allows each collaborator to download it to her machine as needed, e.g., using wget , curl , or rsync . If the data files for your project are smaller, you could also share them via services like Dropbox ( www.dropbox.com ) or Google Drive ( https://www.google.com/drive/ ).

However, some intermediate data files may change over time, and the practical necessity to ensure all collaborators are using the same data set may override the advice to not put code output under version control, as described in Box 2 . Again, returning to the ChIP-seq example, the first step calling the peaks is the most difficult computationally because it requires access to a Unix-like environment and sufficient computational resources. Thus, for collaborators that want to experiment with clean.py and analyze.R without having to run process.sh , you could version the data files containing the ChIP-seq peaks (which are in BED format). But since these files are larger than those typically used with Git, you can instead use one of the solutions for versioning large files within a Git repository without actually saving the file with Git, e.g., git-annex ( https://git-annex.branchable.com/ ) or git-fat ( https://github.com/jedbrown/git-fat/ ). Recently, GitHub has created their own solution for managing large files called Git Large File Storage (LFS) ( https://git-lfs.github.com/ ). Instead of committing the entire large file to Git, which quickly becomes unmanageable, it commits a text pointer. This text pointer refers to a specific file saved on a remote GitHub server. Thus, when you clone a repository, it only downloads the latest version of the large file. If you check out an older version of the repository, it automatically downloads the old version of the large file from the remote server. After installing Git LFS, you can manage all the BED files with one command: git lfs track "*.bed" . Then you can commit the BED files just like your scripts, and they will automatically be handled with Git LFS. Now, if you were to change the parameters of the peak calling algorithm and re-run process.sh , you could commit the updated BED files, and your collaborators could pull the new versions of the files directly to their local Git repositories.

Below, we focus on the technical aspects of sharing your code. However, there are also other issues to consider when deciding if and how you are going to make your code available to others. For quick advice on these subjects, see Box 4 on how to license your code, Box 5 on concerns about being scooped, and Box 6 on the increasing trend of journals to institute sharing policies that require authors to deposit code in a public archive upon publication.

Box 4. Choosing a License

Putting software and other material in a public place is not the same as making it publicly usable. In order to do that, the authors must also add a license, since copyright laws in some jurisdictions require people to treat anything that isn’t explicitly open as being proprietary.

While dozens of open licenses have been created, the two most widely used are the GNU Public License (GPL) and the MIT/BSD family of licenses. Of these, the MIT/BSD-style licenses put the fewest requirements on re-use, and thereby make it easier for people to integrate your software into their projects.

For an excellent short discussion of these issues, and links to more information, see Jake Vanderplas’s blog post from March 2014 at http://www.astrobetter.com/blog/2014/03/10/the-whys-and-hows-of-licensing-scientific-code/ . For a more in-depth discussion of the legal implications of different licenses, see Morin et al., 2012 [ 6 ].

Box 5. Being Scooped

One concern scientists frequently have about putting work in progress online is that they will be scooped, e.g., that someone will analyze their data and publish a result that they themselves would have, but hadn’t yet. In practice, though, this happens rarely, if at all: in fact, the authors are not aware of a single case in which this has actually happened, and would welcome pointers to specific instances. In practice, it seems more likely that making work public early in something like a version control repository, which automatically adds timestamps to content, will help researchers establish their priority.

Box 6. Journal Policies

Sharing data, code, and other materials is quickly moving from “desired” to “required.” For example, PLOS’s sharing policy ( http://journals.plos.org/plosone/s/materials-and-software-sharing ) already says, “We expect that all researchers submitting to PLOS will make all relevant materials that may be reasonably requested by others available without restrictions upon publication of the work.” Its policy on software is more specific:

We expect that all researchers submitting to PLOS submissions in which software is the central part of the manuscript will make all relevant software available without restrictions upon publication of the work. Authors must ensure that software remains usable over time regardless of versions or upgrades…

It then goes on to specify that software must be based on open source standards, and that it must be put in an archive which is large or long-lived. Granting agencies, philanthropic foundations, and other major sponsors of scientific research are all moving in the same direction, and, to our knowledge, none has relaxed or reduced sharing requirements in the last decade.

To begin using GitHub, you will first need to sign up for an account. For the code examples in this tutorial, you will need to replace username with the username of your account. Next, choose the option to “Create a new repository” ( Fig 3B , see https://help.github.com/articles/create-a-repo/ ). Call it “thesis,” because that is the directory name containing the files on your computer, but note that you can give it a different name on GitHub if you wish. Also, now that the code will exist in multiple places, you need to learn some more terminology ( Box 1 ). A local repository refers to code that is stored on the machine you are using, e.g., your laptop; whereas a remote repository refers to the code that is hosted online. Thus, you have just created a remote repository.

Now you need to send the code on your computer to GitHub. The key to this is the URL that GitHub assigns your newly created remote repository. It will have the form https://github.com/username/thesis.git (see https://help.github.com/articles/cloning-a-repository/ ). Notice that this URL is using the HTTPS protocol, which is the quickest to begin using. However, it requires you to enter your username and password when communicating with GitHub, so you’ll want to consider switching to the SSH protocol once you are regularly using Git and GitHub (see https://help.github.com/articles/generating-ssh-keys/ for directions). In order to link the local thesis repository on your computer to the remote repository you just created, in your local repository, you need to tell Git the URL of the remote repository using the command git remote add ( Fig 3C ).

$ git remote add origin https://github.com/username/thesis.git

The name “origin” is a bookmark for the remote repository so that you do not have to type out the full URL every time you transfer your changes (this is the default name for a remote repository, but you could use another name if you like).

Send your code to GitHub using the command git push ( Fig 3D ).

$ git push origin master

You first specify the remote repository, “origin.” Second, you tell Git to push to the “master” copy of the repository—we will not go into other options in this tutorial, but Box 7 discusses them briefly.

Pushing to GitHub also has the added benefit of backing up your code in case anything were to happen to your computer. Also, it can be used to manually transfer your code across multiple machines, similar to a service like Dropbox ( www.dropbox.com ) but with the added capabilities and control of Git. For example, what if you wanted to work on your code on your computer at home? You can download the Git repository using the command git clone .

$ git clone https://github.com/username/thesis.git

By default, this will download the Git repository into a local directory named “thesis.” Furthermore, the remote “origin” will automatically be added so that you can easily push your changes back to GitHub. You now have copies of your repository on your work computer, your GitHub account online, and your home computer. You can make changes, commit them on your home computer, and send those commits to the remote repository with git push , just as you did on your work computer.

Then the next day back at your work computer, you could update the code with the changes you made the previous evening using the command git pull .

$ git pull origin master

This pulls in all the commits that you had previously pushed to the GitHub remote repository from your home computer. In this workflow, you are essentially collaborating with yourself as you work from multiple computers. If you are working on a project with just one or two other collaborators, you could extend this workflow so that they could edit the code in the same way. You can do this by adding them as Collaborators on your repository (Settings -> Collaborators -> Add collaborator; see https://help.github.com/articles/adding-collaborators-to-a-personal-repository/ ). However, with projects with lots of contributors, GitHub provides a workflow for finer-grained control of the code development.

With the addition of a GitHub account and a few commands for sending and receiving code, you can now share your code with others, transfer your code across multiple machines, and set up simple collaborative workflows.

Contribute to Other Projects

Lots of scientific software is hosted online in Git repositories. Now that you know the basics of Git, you can directly contribute to developing the scientific software you use for your research ( Fig 4 ). From a small contribution like fixing a typo in the documentation to a larger change such as fixing a bug, it is empowering to be able to improve the software used by yourself and many other scientists.

An external file that holds a picture, illustration, etc.
Object name is pcbi.1004668.g004.jpg

We would like you to add an empty file that is named after your GitHub username to the repo used to write this manuscript. (A) Using your internet browser, navigate to https://github.com/jdblischak/git-for-science . (B) Click on the “Fork” button to create a copy of this repo on GitHub under your username. (C) On your computer, type git clone https://github.com/username/git-for-science.git , which will create a copy of git-for-science on your local machine. (D) Navigate to the readers directory by typing cd git-for-science/readers/ . Create an empty file that is titled with your GitHub username by typing touch username.txt . Commit that new file by adding it to the staging area ( git add username.txt ) and committing with a message ( git commit -m "Add username to directory of readers." ). Note that your commit identifier will be different than what is shown here. (E) You have committed your new file locally, and the next step is to push that new commit up to the git-for-science repo under your username on GitHub. To do so, type git push origin master . (F) To request to add your commits to the original git-for-science repo, issue a pull request from the git-for-science repo under your username on GitHub. Once your Pull Request is reviewed and accepted, you will be able to see the file you committed with your username in the original git-for-science repository.

When contributing to a larger project with many contributors, you will not be able to push your changes with git push directly to the project’s remote repository. Instead, you will first need to create your own remote copy of the repository, which on GitHub is called a fork ( Box 1 ). You can fork any repository on GitHub by clicking the button “Fork” on the top right of the page (see https://help.github.com/articles/fork-a-repo/ ).

Once you have a fork of a project’s repository, you can clone it to your computer and make changes just like a repository you created yourself. As an exercise, you will add a file to the repository that we used to write this paper. First, go to https://github.com/jdblischak/git-for-science and choose the “Fork” option to create a git-for-science repository under your GitHub account ( Fig 4B ). In order to make changes, download it to your computer with the command git clone from the directory you wish the repo to appear in ( Fig 4C ).

$ git clone https://github.com/username/git-for-science.git

Now that you have a local version, navigate to the subdirectory readers and create a text file named as your GitHub username ( Fig 4D ).

$ cd git-for-science/readers

$ touch username.txt

Add and commit this new file ( Fig 4D ), and then push the changes back to your remote repository on GitHub ( Fig 4E ).

$ git add username.txt

$ git commit -m "Add username to directory of readers."

Currently, the new file you created, readers/username.txt , only exists in your fork of git-for-science. To merge this file into the main repository, send a pull request using the GitHub interface (Pull request -> New pull request -> Create pull request; Fig 4F ; see https://help.github.com/articles/using-pull-requests/ ). After the pull request is created, we can review your change and then merge it into the main repository. Although this process of forking a project’s repository and issuing a pull request seems like a lot of work to contribute changes, this workflow gives the owner of a project control over what changes get incorporated into the code. You can have others contribute to your projects using the same workflow.

The ability to use Git to contribute changes is very powerful because it allows you to improve the software that is used by many other scientists and also potentially shape the future direction of its development.

Git, albeit complicated at first, is a powerful tool that can improve code development and documentation. Ultimately, the complexity of a VCS not only gives users a well-documented “undo” button for their analyses, but it also allows for collaboration and sharing of code on a massive scale. Furthermore, it does not need to be learned in its entirety to be useful. Instead, you can derive tangible benefits from adopting version control in stages. With a few commands ( git init , git add , git commit ), you can start tracking your code development and avoid a file system full of copied files ( Fig 2 ). Adding a few additional commands ( git push , git clone , git pull ) and a GitHub account, you can share your code online, transfer your changes across machines, and collaborate in small groups ( Fig 3 ). Lastly, by forking public repositories and sending pull requests, you can directly improve scientific software ( Fig 4 ).

We collaboratively wrote the article in LaTeX ( http://www.latex-project.org/ ) using the online authoring platform Authorea ( https://www.authorea.com ). Furthermore, we tracked the development of the document using Git and GitHub. The Git repo is available at https://github.com/jdblischak/git-for-science , and the rendered LaTeX article is available at https://www.authorea.com/users/5990/articles/17489 .

Supporting Information

This Bash script downloads the ENCODE CTCF ChIP-seq data from multiple types of kidney samples and calls peaks. See https://github.com/jdblischak/git-for-science/tree/master/code for instructions on running it.

This Python script filters peaks with a fold change cutoff and merges peaks from the different kidney samples. See https://github.com/jdblischak/git-for-science/tree/master/code for instructions on running it.

This R script creates diagnostic plots on the length of the peaks and their distribution across the genome. See https://github.com/jdblischak/git-for-science/tree/master/code for instructions on running it.

Funding Statement

JDB is supported by National Institutes of Health grant AI087658 awarded to Yoav Gilad. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

How I wrote my Thesis in Markdown using Ulysses, Pandoc, LaTeX, Zotero, and GitHub

September 27, 2019 | 7 min read

I'm a big fan of Ulysses for writing. Using the simple Markdown syntax, it is easy to concentrate on writing content without any unnecessary distractions. Because of that, I also wanted to write my bachelor's thesis in Markdown with Ulysses.

Did I build this workflow to procrastinate from actually writing my thesis? Maybe 😅

But in the end, I found it to be a smooth experience that helped me finish the thesis more quickly.

However, there are some features that Markdown does not provide out of the box, which are necessary for writing a thesis. Namely, generating tables of content, reference management and citations, cross-references, and equations. So, I looked for ways to fill in those gaps.

This article requires a basic understanding of using the command line and is mainly focused on macOS users. For installing required software, Homebrew is recommended

Project Structure and Version Management with Git

As a developer, I'm used to using Git in my projects, committing changes, and being able to see a log of everything I did. As I wanted to open-source my thesis anyway, I decided to use Git with GitHub for it as well.

My project structure is fairly simple. All content and content-related files are located in a src folder. This is what it looks like in my project folder:

If you want to explore the project structure, visit my thesis repository .

Ulysses to Markdown

Ulysses uses a syntax that extends Markdown's usual spec. They call it "Markdown XL". To process your text, you need to get it out of Ulysses. You could just add an external folder to Ulysses and work with that, but that disables a lot of features, like easily inserting images, iCloud sync, the ability to split your content into multiple sections, etc., that make Ulysses great.

An easy workaround is exporting your content as a TextBundle . A TextBundle is essentially a folder that contains your content as a Markdown file and all assets like images that your text contains. I exported my thesis into the Git repository every time I finished a section.

Reference Management with Zotero

A good thesis needs good sources and managing them all by hand is tedious and error-prone. This is where Zotero comes in. Zotero is a free and open reference management software that supports a variety of export formats. Using its Chrome extension, it's easy to add new sources.

To make Zotero compatible with our workflow, you need to install the Better BibTeX extension. This extension enables Zotero to automatically create and update a bibliography file that we can later integrate into the other tools.

To create the bibliography file, click Export , select Better BibLaTeX and check Keep updated . You can now save this file in your project folder next to your exported TextBundle.

Zotero to Ulysses

To make it easier to insert references from Zotero into Ulysses, I created an Alfred workflow that provides a simple shortcut to do that. In Ulysses, just press ⌘ + ^ and Zotero will show you a search window where you can search for the reference you want to insert. When you've got your references, press enter and Zotero will insert the reference into your document.

You can download this workflow here . Of course, you can change the shortcut or the way it behaves in Alfred.

Inserted references look like [@knapp2019] , where knapp2019 is the reference key from Zotero. If you want to provide a page, you can just add it to the reference: [@knapp2019, pp. 20-24] . This is Pandoc syntax, but more on that in the next chapter.

Typesetting with Pandoc and LaTeX

Pandoc is a useful utility for converting different markup formats between each other. In our case, we need to convert Markdown to LaTeX. In addition to converting markup formats, Pandoc supports processing citations and cross-references using plugins.

If you just want to compile your document to a PDF with a template, you can skip to the next section.

You can install Pandoc and required extensions using Homebrew:

To convert your text to LaTeX, use the following command inside the TextBundle:

This will generate a standalone ( -s ) TeX file containing the converted text and all packages that are required to compile it. To work with LaTeX files, you need the MacTeX package. To install it do that, run:

This will install all required packages that you need to work with LaTeX. Please note that this package is quite huge at about 4 GB.

To compile the generated LaTeX file to a PDF, run the following command:

This will take a few seconds depending on the size of your document and create a PDF called text.pdf . The created PDF already contains a generated bibliography and looks nice. However, you might want to customize it so it fits your or your university's style.

Templating in LaTeX

Creating a template in LaTeX is easy and chances are that there's already a good template for your university or other styles that you can use.

In my case, I used a template by Manfred Enning for the style of my university, FH Aachen .

Create a new index.tex file in the template directory and insert the template here. There are only three important things you need to make sure.

First, the template must include the text.tex file generated by Pandoc. To include it, insert the following line where the content should be placed:

Second, LaTeX needs to know where referenced images are stored. For that, insert the following line before the beginning of the document:

Third, if you use references, you need to tell LaTeX to import them and where to place them. To do that, add the following lines to the start of your template:

Then, add the following line where you want to insert the bibliography:

If you want to add a table of contents, a list of tables, or a list of figures, you can add these lines:

For an example of what this template file could look like, take a look at my template on GitHub.

Now, to compile your document to a PDF using the template, you need to compile your Markdown file to LaTeX again, but this time without the standalone ( -s ) flag. If you've skipped the typesetting step, you can now skip to the next section. Otherwise, run this command:

Then, switch to the template directory and run the following command:

This will create an index.pdf file containing your styled document.

Streamlining the Workflow with Docker

The last steps all seem like a lot of work that you don't necessarily want to do every time you want to compile your document to PDF. To make it all easier, I've created a Docker image that does all the work for you.

If you don't already use Docker, you can install it by running:

To compile your document using Docker, just run:

This image will then convert your Markdown to LaTeX, make sure all images are in a supported format, compile everything to a PDF, copy it to the output directory and clean up any files that have been created during the compilation – LaTeX creates a lot of files.

Automating the Build Process with GitHub Actions

If you want to automatically let GitHub build the PDF on every release you create, add this file to your repository. Now, every time you create a release on GitHub, your document will automatically be compiled to a PDF and the result will be attached to the release like this .

This process takes a few minutes but since it runs automatically in the background, it doesn't really matter.

Tips for Using LaTeX Syntax in Markdown XL

Using Pandoc with the plugins described above allows you to use special syntax for specific LaTeX commands.

Named References

Named references can be added to figures, tables, equations, and sections. For figures, place {#fig:referenceName} directly after an image in Ulysses (no whitespace allowed). You can now reference the figure with @fig:referenceName .

The concept works the same for the other categories. Named sections, for example, work like this:

You can now reference that section using @sec:referenceName .

Writing equations in Markdown XL is as easy as creating a new Raw Source Block ( ~~...~~ ) and adding a math block with the equation inside of it. For example:

Labelled Tables

To create a table with a label, you can place the label below the table after a colon like this:

Wrapping Up

The presented workflow might look a bit complicated, but it works well once it's all set up. I didn't need to worry about layout issues or how my text would be formatted later and could just write in my favorite editor, an environment that I'm already used to.

Since the content is written in Markdown, it can easily be published to other mediums, like blogs, ebooks, and more. This makes it super flexible.

Feel free to contact me @leolabs_org if you have any questions or suggestions.

Get the Reddit app

Using git to write a thesis - how to proceed when i want to rewrite a chapter from scratch.

I have implemented a git repo for the thesis I'm currently writing and I use it mainly as a backup tool and documentation device (through commit messages).

The repo consists of several .tex files and I would like to scrap one of those and rewrite it from scratch. What is the best way to proceed here?

I could simply do a new .tex file "chapter2_v2.tex" and write there, but that seems... non-git in a way. I could just delete everything, put something like "start revision 2 of chapter 2" in the commit message. That however would make it complicated in case I want to copy some small part of text or formulas from the original version. In theory, the rewrite of that chapter is supposed to eventually supersede whatever I have written right now. I just don't want to make it too hard to merge small bits and pieces from the current version.

How would you proceed in this case?

By continuing, you agree to our User Agreement and acknowledge that you understand the Privacy Policy .

Enter the 6-digit code from your authenticator app

You’ve set up two-factor authentication for this account.

Enter a 6-digit backup code

Create your username and password.

Reddit is anonymous, so your username is what you’ll go by here. Choose wisely—because once you get a name, you can’t change it.

Reset your password

Enter your email address or username and we’ll send you a link to reset your password

Check your inbox

An email with a link to reset your password was sent to the email address associated with your account

Choose a Reddit account to continue

How writers can get work done better with Git

How to create outlines in Linux with TreeLine

Startup Stock Photos. Creative Commons CC0 license.

Git is one of those rare applications that has managed to encapsulate so much of modern computing into one program that it ends up serving as the computational engine for many other applications. While it's best-known for tracking source code changes in software development, it has many other uses that can make your life easier and more organized. In this series leading up to Git's 14th anniversary on April 7, we'll share seven little-known ways to use Git. Today, we'll look at ways writers can use Git to get work done.

Git for writers

Some people write fiction; others write academic papers, poetry, screenplays, technical manuals, or articles about open source. Many do a little of each. The common thread is that if you're a writer, you could probably benefit from using Git. While Git is famously a highly technical tool used by computer programmers, it's ideal for the modern author, and this article will demonstrate how it can change the way you write—and why you'd want it to.

Before talking about Git, though, it's important to talk about what copy (or content , for the digital age) really is, and why it's different from your delivery medium . It's the 21 st century, and the tool of choice for most writers is a computer. While computers are deceptively good at combining processes like copy editing and layout, writers are (re)discovering that separating content from style is a good idea, after all. That means you should be writing on a computer like it's a typewriter, not a word processor. In computer lingo, that means writing in plaintext .

Writing in plaintext

It used to be a safe assumption that you knew what market you were writing for. You wrote content for a book, or a website, or a software manual. These days, though, the market's flattened: you might decide to use content you write for a website in a printed book project, and the printed book might release an EPUB version later. And in the case of digital editions of your content, the person reading your content is in ultimate control: they may read your words on the website where you published them, or they might click on Firefox's excellent Reader View , or they might print to physical paper, or they could dump the web page to a text file with Lynx, or they may not see your content at all because they use a screen reader.

It makes sense to write your words as words, leaving the delivery to the publishers. Even if you are also your own publisher, treating your words as a kind of source code for your writing is a smarter and more efficient way to work, because when it comes time to publish, you can use the same source (your plaintext) to generate output appropriate to your target (PDF for print, EPUB for e-books, HTML for websites, and so on).

Writing in plaintext not only means you don't have to worry about layout or how your text is styled, but you also no longer require specialized tools. Anything that can produce text becomes a valid "word processor" for you, whether it's a basic notepad app on your mobile or tablet, the text editor that came bundled with your computer, or a free editor you download from the internet. You can write on practically any device, no matter where you are or what you're doing, and the text you produce integrates perfectly with your project, no modification required.

And, conveniently, Git specializes in managing plaintext.

The Atom editor

When you write in plaintext, a word processor is overkill. Using a text editor is easier because text editors don't try to "helpfully" restructure your input. It lets you type the words in your head onto the screen, no interference. Better still, text editors are often designed around a plugin architecture, such that the application itself is woefully basic (it edits text), but you can build an environment around it to meet your every need.

A great example of this design philosophy is the Atom editor. It's a cross-platform text editor with built-in Git integration. If you're new to working in plaintext and new to Git, Atom is the easiest way to get started.

Install Git and Atom

First, make sure you have Git installed on your system. If you run Linux or BSD, Git is available in your software repository or ports tree. The command you use will vary depending on your distribution; on Fedora, for instance:

You can also download and install Git for  Mac  and  Windows .

You won't need to use Git directly, because Atom serves as your Git interface. Installing Atom is the next step.

If you're on Linux, install Atom from your software repository through your software installer or the appropriate command, such as:

Atom does not currently build on BSD. However, there are very good alternatives available, such as GNU Emacs . For Mac and Windows users, you can find installers on the Atom website .

Once your installs are done, launch the Atom editor.

A quick tour

If you're going to live in plaintext and Git, you need to get comfortable with your editor. Atom's user interface may be more dynamic than what you are used to. You can think of it more like Firefox or Chrome than as a word processor, in fact, because it has tabs and panels that can be opened and closed as they are needed, and it even has add-ons that you can install and configure. It's not practical to try to cover all of Atom's many features, but you can at least get familiar with what's possible.

When Atom opens, it displays a welcome screen. If nothing else, this screen is a good introduction to Atom's tabbed interface. You can close the welcome screens by clicking the "close" icons on the tabs at the top of the Atom window and create a new file using  File > New File .

Working in plaintext is a little different than working in a word processor, so here are some tips for writing content in a way that a human can connect with and that Git and computers can parse, track, and convert.

Write in Markdown

These days, when people talk about plaintext, mostly they mean Markdown. Markdown is more of a style than a format, meaning that it intends to provide a predictable structure to your text so computers can detect natural patterns and convert the text intelligently. Markdown has many definitions, but the best technical definition and cheatsheet is on CommonMark's website .

As you can tell from the example, Markdown isn't meant to read or feel like code, but it can be treated as code. If you follow the expectations of Markdown defined by CommonMark, then you can reliably convert, with just one click of a button, your writing from Markdown to .docx, .epub, .html, MediaWiki, .odt, .pdf, .rtf, and a dozen other formats without loss of formatting.

You can think of Markdown a little like a word processor's styles. If you've ever written for a publisher with a set of styles that govern what chapter titles and section headings look like, this is basically the same thing, except that instead of selecting a style from a drop-down menu, you're adding little notations to your text. These notations look natural to any modern reader who's used to "txt speak," but are swapped out with fancy text stylings when the text is rendered. It is, in fact, what word processors secretly do behind the scenes. The word processor shows bold text, but if you could see the code generated to make your text bold, it would be a lot like Markdown (actually it's the far more complex XML). With Markdown, that barrier is removed, which looks scarier on the one hand, but on the other hand, you can write Markdown on literally anything that generates text without losing any formatting information.

The popular file extension for Markdown files is .md. If you're on a platform that doesn't know what a .md file is, you can associate the extension to Atom manually or else just use the universal .txt extension. The file extension doesn't change the nature of the file; it just changes how your computer decides what to do with it. Atom and some platforms are smart enough to know that a file is plaintext no matter what extension you give it.

Live preview

Atom features the Markdown Preview plugin, which shows you both the plain Markdown you're writing and the way it will (commonly) render.

Atom's preview screen

To activate this preview pane, select Packages > Markdown Preview > Toggle Preview or press Ctrl+Shift+M .

This view provides you with the best of both worlds. You get to write without the burden of styling your text, but you also get to see a common example of what your text will look like, at least in a typical digital format. Of course, the point is that you can't control how your text is ultimately rendered, so don't be tempted to adjust your Markdown to force your render preview to look a certain way.

One sentence per line

Your high school writing teacher doesn't ever have to see your Markdown.

It won't come naturally at first, but maintaining one sentence per line makes more sense in the digital world. Markdown ignores single line breaks (when you've pressed the Return or Enter key) and only creates a new paragraph after a single blank line.

Writing in Atom

The advantage of writing one sentence per line is that your work is easier to track. That is, if you've changed one word at the start of a paragraph, then it's easy for Atom, Git, or any application to highlight that change in a meaningful way if the change is limited to one line rather than one word in a long paragraph. In other words, a change to one sentence should only affect that sentence, not the whole paragraph.

You might be thinking, "many word processors track changes, too, and they can highlight a single word that's changed." But those revision trackers are bound to the interface of that word processor, which means you can't look through revisions without being in front of that word processor. In a plaintext workflow, you can review revisions in plaintext, which means you can make or approve edits no matter what you have on hand, as long as that device can deal with plaintext (and most of them can).

Writers admittedly don't usually think in terms of line numbers, but it's a useful tool for computers, and ultimately a great reference point in general. Atom numbers the lines of your text document by default. A line is only a line once you have pressed the Enter or Return key.

Writing in Atom

If a line has a dot instead of a number, that means it's part of the previous line wrapped for you because it couldn't fit on your screen.

If you're a visual person, you might be very particular about the way your writing environment looks. Even if you are writing in plain Markdown, it doesn't mean you have to write in a programmer's font or in a dark window that makes you look like a coder. The simplest way to modify what Atom looks like is to use theme packages . It's conventional for theme designers to differentiate dark themes from light themes, so you can search with the keyword Dark or Light, depending on what you want.

To install a theme, select Edit > Preferences . This opens a new tab in the Atom interface. Yes, tabs are used for your working documents and for configuration and control panels. In the Settings tab, click on the Install category.

In the Install panel, search for the name of the theme you want to install. Click the Themes button on the right of the search field to search only for themes. Once you've found your theme, click its Install button.

Atom's themes

To use a theme you've installed or to customize a theme to your preference, navigate to the Themes category in your Settings tab. Pick the theme you want to use from the drop-down menu. The changes take place immediately, so you can see exactly how the theme affects your environment.

You can also change your working font in the Editor category of the Settings tab. Atom defaults to monospace fonts, which are generally preferred by programmers. But you can use any font on your system, whether it's serif or sans or gothic or cursive. Whatever you want to spend your day staring at, it's entirely up to you.

On a related note, by default Atom draws a vertical marker down its screen as a guide for people writing code. Programmers often don't want to write long lines of code, so this vertical line is a reminder to them to simplify things. The vertical line is meaningless to writers, though, and you can remove it by disabling the wrap-guide package.

To disable the wrap-guide package, select the Packages category in the Settings tab and search for wrap-guide . When you've found the package, click its Disable button.

Dynamic structure

When creating a long document, I find that writing one chapter per file makes more sense than writing an entire book in a single file. Furthermore, I don't name my chapters in the obvious syntax chapter-1.md or 1.example.md , but by chapter titles or keywords, such as example.md . To provide myself guidance in the future about how the book is meant to be assembled, I maintain a file called toc.md (for "Table of Contents") where I list the (current) order of my chapters.

I do this because, no matter how convinced I am that chapter 6 just couldn't possibly happen before chapter 1, there's rarely a time that I don't swap the order of one or two chapters or sections before I'm finished with a book. I find that keeping it dynamic from the start helps me avoid renaming confusion, and it also helps me treat the material less rigidly.

Git in Atom

Two things every writer has in common is that they're writing for keeps and their writing is a journey. You don't sit down to write and finish with a final draft; by definition, you have a first draft. And that draft goes through revisions, each of which you carefully save in duplicate and triplicate just in case one of your files turns up corrupted. Eventually, you get to what you call a final draft, but more than likely you'll be going back to it one day, either to resurrect the good parts or to fix the bad.

The most exciting feature in Atom is its strong Git integration. Without ever leaving Atom, you can interact with all of the major features of Git, tracking and updating your project, rolling back changes you don't like, integrating changes from a collaborator, and more. The best way to learn it is to step through it, so here's how to use Git within the Atom interface from the beginning to the end of a writing project.

First thing first: Reveal the Git panel by selecting View > Toggle Git Tab . This causes a new tab to open on the right side of Atom's interface. There's not much to see yet, so just keep it open for now.

Starting a Git project

You can think of Git as being bound to a folder. Any folder outside a Git directory doesn't know about Git, and Git doesn't know about it. Folders and files within a Git directory are ignored until you grant Git permission to keep track of them.

You can create a Git project by creating a new project folder in Atom. Select File > Add Project Folder and create a new folder on your system. The folder you create appears in the left Project Panel of your Atom window.

Right-click on your new project folder and select New File to create a new file in your project folder. If you have files you want to import into your new project, right-click on the folder and select Show in File Manager to open the folder in your system's file viewer (Dolphin or Nautilus on Linux, Finder on Mac, Explorer on Windows), and then drag-and-drop your files.

With a project file (either the empty one you created or one you've imported) open in Atom, click the Create Repository button in the Git tab. In the pop-up dialog box, click Init to initialize your project directory as a local Git repository. Git adds a .git directory (invisible in your system's file manager, but visible to you in Atom) to your project folder. Don't be fooled by this: The .git directory is for Git to manage, not you, so you'll generally stay out of it. But seeing it in Atom is a good reminder that you're working in a project actively managed by Git; in other words, revision history is available when you see a .git directory.

In your empty file, write some stuff. You're a writer, so type some words. It can be any set of words you please, but remember the writing tips above.

Press Ctrl+S to save your file and it will appear in the Unstaged Changes section of the Git tab. That means the file exists in your project folder but has not yet been committed over to Git's purview. Allow Git to keep track of your file by clicking on the Stage All button in the top-right of the Git tab. If you've used a word processor with revision history, you can think of this step as permitting Git to record changes.

Your file is now staged. All that means is Git is aware that the file exists and is aware that it has been changed since the last time Git was made aware of it.

A Git commit sends your file into Git's internal and eternal archives. If you're used to word processors, this is similar to naming a revision. To create a commit, enter some descriptive text in the Commit message box at the bottom of the Git tab. You can be vague or cheeky, but it's more useful if you enter useful information for your future self so that you know why the revision was made.

The first time you make a commit, you must create a branch. Git branches are a little like alternate realities, allowing you to switch from one timeline to another to make changes that you may or may not want to keep forever. If you end up liking the changes, you can merge one experimental branch into another, thereby unifying different versions of your project. It's an advanced process that's not worth learning upfront, but you still need an active branch, so you have to create one for your first commit.

Click on the Branch icon at the very bottom of the Git tab to create a new branch.

Creating a branch

It's customary to name your first branch master . You don't have to; you can name it firstdraft or whatever you like, but adhering to the local customs can sometimes make talking about Git (and looking up answers to questions) a little easier because you'll know that when someone mentions master , they really mean master and not firstdraft or whatever you called your branch.

On some versions of Atom, the UI may not update to reflect that you've created a new branch. Don't worry; the branch will be created (and the UI updated) once you make your commit. Press the Commit button, whether it reads Create detached commit or Commit to master .

Once you've made a commit, the state of your file is preserved forever in Git's memory.

History and Git diff

A natural question is how often you should make a commit. There's no one right answer to that. Saving a file with Ctrl+S and committing to Git are two separate processes, so you will continue to do both. You'll probably want to make commits whenever you feel like you've done something significant or are about to try out a crazy new idea that you may want to back out of.

To get a feel for what impact a commit has on your workflow, remove some text from your test document and add some text to the top and bottom. Make another commit. Do this a few times until you have a small history at the bottom of your Git tab, then click on a commit to view it in Atom.

Viewing differences

When viewing a past commit, you see three elements:

  • Text in green was added to a document when the commit was made.
  • Text in red was removed from the document when the commit was made.
  • All other text was untouched.

Remote backup

One of the advantages of using Git is that, by design, it is distributed, meaning you can commit your work to your local repository and push your changes out to any number of servers for backup. You can also pull changes in from those servers so that whatever device you happen to be working on always has the latest changes.

For this to work, you must have an account on a Git server. There are several free hosting services out there, including GitHub, the company that produces Atom but oddly is not open source, and GitLab, which is open source. Preferring open source to proprietary, I'll use GitLab in this example.

If you don't already have a GitLab account, sign up for one and start a new project. The project name doesn't have to match your project folder in Atom, but it probably makes sense if it does. You can leave your project private, in which case only you and anyone you give explicit permissions to can access it, or you can make it public if you want it to be available to anyone on the internet who stumbles upon it.

Do not add a README to the project.

Once the project is created, it provides you with instructions on how to set up the repository. This is great information if you decide to use Git in a terminal or with a separate GUI, but Atom's workflow is different.

Click the Clone button in the top-right of the GitLab interface. This reveals the address you must use to access the Git repository. Copy the SSH address (not the https address).

In Atom, click on your project's .git directory and open the config . Add these configuration lines to the file, adjusting the seth/example.git part of the url value to match your unique address.

At the bottom of the Git tab, a new button has appeared, labeled Fetch . Since your server is brand new and therefore has no data for you to fetch, right-click on the button and select Push . This pushes your changes to your Gitlab account, and now your project is backed up on a Git server.

Pushing changes to a server is something you can do after each commit. It provides immediate offsite backup and, since the amount of data is usually minimal, it's practically as fast as a local save.

Writing and Git

Git is a complex system, useful for more than just revision tracking and backups. It enables asynchronous collaboration and encourages experimentation. This article has covered the basics, but there are many more articles—and entire books—on Git and how to use it to make your work more efficient, more resilient, and more dynamic. It all starts with using Git for small tasks. The more you use it, the more questions you'll find yourself asking, and eventually the more tricks you'll learn.

Seth Kenlon

Related Content

Puzzle pieces coming together to form a computer screen

Quickstart for writing on GitHub

Learn advanced formatting features by creating a README for your GitHub profile.

In this article

Introduction.

Markdown is an easy-to-read, easy-to-write language for formatting plain text. You can use Markdown syntax, along with some additional HTML tags, to format your writing on GitHub, in places like repository READMEs and comments on pull requests and issues. In this guide, you'll learn some advanced formatting features by creating or editing a README for your GitHub profile.

If you're new to Markdown, you might want to start with " Basic writing and formatting syntax " or the Communicate using Markdown GitHub Skills course.

If you already have a profile README, you can follow this guide by adding some features to your existing README, or by creating a gist with a Markdown file called something like about-me.md . For more information, see " Creating gists ."

Creating or editing your profile README

Your profile README lets you share information about yourself with the community on GitHub.com. The README is displayed at the top of your profile page.

If you don't already have a profile README, you can add one.

  • Create a repository with the same name as your GitHub username, initializing the repository with a README.md file. For more information, see " Managing your profile README ."
  • Edit the README.md file and delete the template text (beginning ### Hi there ) that is automatically added when you create the file.

If you already have a profile README, you can edit it from your profile page.

In the upper-right corner of any GitHub page, click your profile photo, then click Your profile .

Click the next to your profile README.

Screenshot of @octocat's profile README. A pencil icon is outlined in dark orange.

Adding an image to suit your visitors

You can include images in your communication on GitHub. Here, you'll add a responsive image, such as a banner, to the top of your profile README.

By using the HTML <picture> element with the prefers-color-scheme media feature, you can add an image that changes depending on whether a visitor is using light or dark mode. For more information, see " Managing your theme settings ."

Copy and paste the following markup into your README.md file.

Replace the placeholders in the markup with the URLs of your chosen images. Alternatively, to try the feature first, you can copy the URLs from our example below.

  • Replace YOUR-DARKMODE-IMAGE with the URL of an image to display for visitors using dark mode.
  • Replace YOUR-LIGHTMODE-IMAGE with the URL of an image to display for visitors using light mode.
  • Replace YOUR-DEFAULT-IMAGE with the URL of an image to display in case neither of the other images can be matched, for example if the visitor is using a browser that does not support the prefers-color-scheme feature.

To make the image accessible for visitors who are using a screen reader, replace YOUR-ALT-TEXT with a description of the image.

To check the image has rendered correctly, click the Preview tab.

For more information on using images in Markdown, see " Basic writing and formatting syntax ."

Example of a responsive image

How the image looks.

Screenshot of the "Preview" tab of a GitHub comment, in light mode. An image of a smiling sun fills the box.

Adding a table

You can use Markdown tables to organize information. Here, you'll use a table to introduce yourself by ranking something, such as your most-used programming languages or frameworks, the things you're spending your time learning, or your favorite hobbies. When a table column contains numbers, it's useful to right-align the column by using the syntax --: below the header row.

Return to the Edit file tab.

To introduce yourself, two lines below the </picture> tag, add an ## About me header and a short paragraph about yourself, like the following.

Two lines below this paragraph, insert a table by copying and pasting the following markup.

In the column on the right, replace THING-TO-RANK with "Languages," "Hobbies," or anything else, and fill in the column with your list of things.

To check the table has rendered correctly, click the Preview tab.

For more information, see " Organizing information with tables ."

Example of a table

How the table looks.

Screenshot of the "Preview" tab of a GitHub comment. Under the "About me" heading is a rendered table with a ranked list of languages.

Adding a collapsed section

To keep your content tidy, you can use the <details> tag to create an expandible collapsed section.

To create a collapsed section for the table you created, wrap your table in <details> tags like in the following example.

Between the <summary> tags, replace THINGS-TO-RANK with whatever you ranked in your table.

Optionally, to make the section display as open by default, add the open attribute to the <details> tag.

To check the collapsed section has rendered correctly, click the Preview tab.

Example of a collapsed section

How the collapsed section looks.

Screenshot of the "Preview" tab of a comment. To the left of the words "Top languages" is an arrow indicating that the section can be expanded.

Adding a quote

Markdown has many other options for formatting your content. Here, you'll add a horizontal rule to divide your page and a blockquote to format your favorite quote.

At the bottom of your file, two lines below the </details> tag, add a horizontal rule by typing three or more dashes.

Below the --- line, add a quote by typing markup like the following.

Replace QUOTE with a quote of your choice. Alternatively, copy the quote from our example below.

To check everything has rendered correctly, click the Preview tab.

Example of a quote

How the quote looks.

Screenshot of the "Preview" tab of a GitHub comment. A quote is indented below a thick horizontal line.

Adding a comment

You can use HTML comment syntax to add a comment that will be hidden in the output. Here, you'll add a comment to remind yourself to update your README later.

Two lines below the ## About me header, insert a comment by using the following markup.

Replace COMMENT with a "to-do" item you remind yourself to do something later (for example, to add more items to the table).

To check your comment is hidden in the output, click the Preview tab.

Example of a comment

Saving your work.

When you're happy with your changes, save your profile README by clicking Commit changes .

Committing directly to the main branch will make your changes visible to any visitor on your profile. If you want to save your work but aren't ready to make it visible on your profile, you can select Create a new branch for this commit and start a pull request .

  • Continue to learn about advanced formatting features. For example, see " Creating diagrams " and " Creating and highlighting code blocks ."
  • Use your new skills as you communicate across GitHub, in issues, pull requests, and discussions. For more information, see " Communicating on GitHub ."

A Quick Introduction to Version Control with Git and GitHub

PLOS

  • 12(1):e1004668
  • This person is not on ResearchGate, or hasn't claimed this research yet.

Emily R Davenport at Pennsylvania State University

  • Pennsylvania State University

The git add/commit process. To store a snapshot of changes in your repository, first git add any files to the staging area you wish to commit (for example, you've updated the process.sh file). Second, type git commit with a message. Only files added to the staging area will be committed. All past commits are located in the hidden .git directory in your repository.

Discover the world's research

  • 25+ million members
  • 160+ million publication pages
  • 2.3+ billion citations

Supplementary resources (3)

  • Francisco Adam Andrade

Israely Lima

  • Letícia M. Puga
  • Jacilane de H. Rabelo

Md Mostafizur Rahman

  • Abdul Barek
  • Shapna Akter

Fan Wu

  • Hanieh Razzaghi
  • Amy Goodwin Davies
  • Samuel Boss

Charles Bailey

  • Jérémy Termoz-Masson

Sebastián Vizcay

  • Pierre Kornprobst
  • Mario-Mihai Mateas
  • Cosmin Marsavina

Ethan C. Campbell

  • Katy M. Christensen
  • Mikelle Nuwer
  • Stephen C. Riser
  • PLOS COMPUT BIOL

David Moreau

  • Anshul Kundaje
  • Shelley F. Aldred
  • Ewan Birney
  • Bradley E. Bernstein

Tim J Hubbard

  • William Stafford Noble
  • Recruit researchers
  • Join for free
  • Login Email Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google Welcome back! Please log in. Email · Hint Tip: Most researchers use their institutional email address as their ResearchGate login Password Forgot password? Keep me logged in Log in or Continue with Google No account? Sign up
  • Git Simplified
  • Individual Productivity
  • Team Collaboration
  • PR & Code Review
  • Security & Admin
  • Sprint Planning & Execution
  • Dev Visibility
  • Git Is Complicated
  • PR & Code Review Friction
  • Context Switching
  • Managing Multiple Repos
  • Merge Conflicts & Rebasing
  • Branch Management
  • Slow Onboarding
  • Lack of Visibility
  • Dev Talent Retention
  • Lack of DORA Metrics
  • Lack of Standardization
  • Too Many Meetings   😂
  • Learn Git Library
  • Git Course & Certification
  • GitKon ’23
  • Sample Customers
  • Customer Stories
  • GitKraken Labs
  • Help Center
  • Get Started
  • GitKraken Desktop features
  • GitLens features
  • Git Integration for Jira features
  • GiKraken CLI features
  • Sign up to start a free trial

Git for Writers

How can Git help keep you organized on a project, like writing a book? 

In a talk for the 2021 GitKon Git conference , Jessi Shakarian explains: “Writing in Git was not something I anticipated when I set out to write a book. In my past life, I was a fiction editor and worked at a small publishing company, so I’m used to working in Microsoft Word or Google Docs.” 

Jessi found a lot of benefits to using Git for writers, and shared her process and recommended tools for writing a book using Git. 

Why choose Git for writing?

“When it came time to start writing my book,” Jessi says, “Google Docs actually felt like more of a hindrance. The blank page really seemed to get to me. I needed something low pressure where I could shut off my brain and just write. It could be garbage, but I just had to get the ball rolling , which was difficult.” 

When you’re getting started writing a book, hopefully you have some type of foundation to work off of, like a collection of blog articles, that can help you outline potential chapters. 

If you’re just getting started using Git, it can be a little scary, even if you’ve had prior developing experience. Maybe you only feel confident committing something to your main branch for example, but don’t know how to create a Git branch or understand what Git pull requests are.

If you’re new to Git, it can be intimidating, and you might feel scared that you’re going to do something wrong or break something. Don’t fear! Git can be quite flexible and intuitive, especially when you’re using the GitKraken Git client to visualize your code in a way that makes sense to both technical and non-technical people. 

GitKraken’s easy-to-read commit graph will help you visualize branch structure and file history and see who made what changes when, so you can easily understand the information in front of you, even if you’re brand new to Git.

Writing Words with Git

If you’re using Git to write a book, files don’t have to be filled with code; they can just be filled with words. So, not only can you use Git to write, you can also use it to outline your book. 

Not ready to share it with the world? No problem. Create a private Git repository where you can dump every thought you have about your book. It doesn’t have to be organized or pretty yet, but should give you a great jumping off point. 

Your Git repository can be a place to let stuff sit and mull where you can write as much as you want without the hindering mental model of a word processor. 

Writing a book is a unique experience for each author, and each writing project has its own constraints and limitations that make it different from a previous writing project. Using Git for writers can be a great way to approach a new project from a different perspective.

GitKraken branches used for a writing project

Here, each Git branch is representing a component of a book: introduction, a sample chapter, and the main chapter at the top. You could also organize this into part 1, part 2, and part 3 for example; beginning, middle, and end; chapter 1, chapter 2, chapter 3, etc.; and so on. It’s however you want to organize and outline your book.

GitKraken Kanban Boards for Organizing Thoughts

In addition to needing a repository to organize your writing, you may also want a project management tool that helps you organize your thoughts and tasks related to the book. Brainstorming and organizing is an important step in the beginning phases of writing your book. 

A Kanban board is a method for organizing a project into columns, such as To Do , In Process , and Done , for example. Each  column contains individual cards associated with project tasks. Kanban boards can be organized in a wide variety of ways and used to manage personal and work projects. Cards can be moved from column to column as the status of project tasks update. 

In this example using GitKraken Boards , you can see the columns are outlined as thought categories, or themes of a book: Chess Foundations, Strategy, Thematic Thoughts, and Misc Thoughts to be Organized.

write thesis git

During the brainstorming, you shouldn’t be afraid to add all your thoughts; good, bad, dumb, it doesn’t matter. You just want to get it all down in one place. You might end up spending weeks or months organizing, rearranging, and shifting through your board until you reach a “final product.” You might even want to turn your cards into physical flashcards and organize a lifesize Kanban board – whatever works for you!

Flashcards used for planning on a writing project

Your brainstorming process will evolve as the thoughts about your book become more tangible and thought-out. You may even want to create an entirely new board with columns representing final chapters or collective themes about your book.  

GitKraken Boards used for outlining chapters for a wrinting project

Basically, you want something a little more readable than your initial idea dump. This will help you understand the core content of your book and how things will come together before getting started with the actual writing.

Things don’t have to be perfect at this stage; you will still have the ability to move things around as needed. It’s nice to have a structure that is flexible and doesn’t feel permanent at this early stage in the book writing process.

In this example, we’re going to use the first column with “Vision Memory Calculation” as our first, sample chapter, so let’s start there and get writing!

GitKraken’s Kanban boards offer GitHub Issue sync, pull request automation, and more features designed specifically for developers and teams.

Getting Started with Git for Writers

Now, it’s one thing to plan everything out and talk about your book with other people, but it’s another thing entirely to actually start writing. This step is intimidating, and while Git may not take all of the fear away from the process, it can help minimize the anxiety. 

One of the reasons Git can help jumpstart the writing process is by breaking that mental model of writing in a word processor.

Drafting Your Book with Git

 Typically, writers are used to starting with an index, and then you write and write and then you move onto the next paragraph. And then that eventually leads to building something resembling a book.    

By working in Git, it’s just words on a page. The below example shows a book draft open in GitKraken, and you can see that the line just keeps going; you can write until you reach the end of a thought, and when you want a new paragraph, go down to the next line. 

Writing in the GitKraken built in editor for a writing project using Git

There isn’t spell check in Git so there’s no squiggly line staring at you telling you that you’ve done something wrong, which can be a big relief when you’re just getting started on the draft of your book. You can feel unencumbered by the mechanisms of writing. 

At this point in the process, it’s not about how well the sentences flow together. You can just write it all out and clean it up later. 

GitKraken has revolutionized the Git workflow with a Git-enhanced terminal experience with powerful visualizations, like commit graph and diff, history, and blame views.

Editing Your Book with Git Pull Requests

After you’ve dumped all of your thoughts into Git and written everything down, the next phase of the process is editing. 

If possible, you should obtain the help of a human editor. This could be a professional book editor, or if that’s not available to you, it could be a co-worker working on the same project as you, or even just a friend. The most important thing is that you choose someone who believes in you and your work, and someone technically savvy enough to learn how to use GitHub. 

One method you can use to edit your book with Git involves GitHub pull requests . You and your editor can create a pull request for needed edits. Any change becomes a pull request; it doesn’t matter how many you have. 

A Git pull request, or PR, is just a fancy name for saying that a change has been made to an existing version.  When your editor has changes, you will get a notification and can review the changes before you “merge” them, creating a new version. In Git, the Git merge action is used to combine changes from one branch to another. 

In the below pull request example, you can see that the requested change is to “Add headings” to the sample-chapter branch, that the pull request is Verified , and the associated commit ID.

A Pull Request on GitHub for a commit in a writing project using Git

Your next step will be to merge the pull requests, which just means that you’re updating the new version with whatever changes exist in the PR. 

Now, after you merge a pull request using the GitKraken Git client , you will be able to view your changes in a variety of ways. Below, you can see file changes displayed in the Diff View, where the red indicates your original draft, green indicates the edited version, and blue indicates notes. 

A Diff view in GitKraken of a file used in a wrinting project leveraging Git

You will keep repeating the drafting and editing steps until you’re complete, or your deadline arrives, whichever comes first. 😅

Benefits of Git for Writers

There are many benefits associated with using Git to organize data of any kind, including words for a book. Here are some of the most significant perks of Git for writers when compared to using a traditional word processor: 

  • Git offers a more detailed version history 
  • Complete transparency is available in Git, and comments don’t vanish like they do in Google Docs or Word
  • Collaboration is better. Git is designed to be used by multiple people in different stages of the process. One person can be editing while another is drafting, etc. 

And if you’re using GitKraken, you can see your notes clearly listed inside of your repository in every commit you’ve made. 

At the end of the day, Git is built around version history, just like all major word processors. If you make changes and then decide that you don’t like them, you can easily go back to a previous version and continue to play around with which version works best for you.

In Git, every commit is different, so if you want to add a new change, it will always be easily accessible at the top of your commit graph in GitKraken. 

A commit message example for a writing project using Git

In the above example, you can see a commit in GitKraken which shows the topic of the commit is Added headings and the change made was Added h1, h2 headings to better structure for reading points . Your comments will never vanish; they will always be associated with unique commit ID numbers, in this example 95a6ea , so you and your editor can reference commit IDs to easily search for and find changes when discussion is needed. 

And don’t forget that Git was designed to be used by multiple people in different stages of the creation process—one person on editing, one person on drafting, etc. In GitKraken, each commit in your graph shows you who made the change with their gravatar. 

If you’re just getting started with Git, GitKraken can help shed light on the complexities of Git so you can understand your project history and have control over your workflow, even if you have limited technical experience.

Ready to Use Git for Writers?

At the end of the day, writing doesn’t happen in a vacuum. Sure, the act of physically writing something is very solitary, but the outlining, brainstorming, and editing process require collaboration amongst multiple people. 

If you’re ready to get started with Git for writers, consider the following steps: 

  • Brainstorm and organize your thoughts with GitKraken Boards . 
  • Outline your chapters as Git branches with the GitKraken Git client . 
  • Edit using Git pull requests with GitHub and GitKraken . 
  • Make your file changes with commits by merging pull requests. 

Make Git Easier, Safer & more Powerful

With the #1 Git GUI + Git Enhanced CLI

write thesis git

  • Documentation  
  • Release Notes  
  • Cheat Sheet
  • CLI Cheat Sheet
  • Cloud Docs 
  • Data Center/Server Docs
  • Security & Trust

Learn Git Library Git Commands Cheat Sheet Git Blog GitKraken Labs Git Conference Ambassador Program Newsletter Slack Community   GitKraken for Students Store Keif Gallery

Contact Us About Us Careers Customers Media News Awards Events Press Releases Logos Privacy

© 2024 Axosoft, LLC DBA GitKraken

Team Collaboration Services

Gitkraken browser extension, gitlens for vs code, gitkraken cli, gitkraken.dev, gitkraken desktop.

Jonathan Bennett's Blog Emacs, Python, and Education

Writing academic papers with org-mode, configs at the end.

If you want to copy this workflow exactly, I will have all of the relevant configuration at the end of this document in a single source block. Just change the paths appropriately and you're good to go.

The Reason I'm Here

Rewind 1 year. I've got an i3 window with Vim on the left and a shell on the right up on my screen, with evince open on desktop 2 so I can see the output. I'm up to my ears in LaTeX, and things aren't making nearly as much sense as I think they should be. I'm designing the final exam which my students will sit in 1 week's time, and it's just not as easy as I'd like it to be.

Actually, though, the problem isn't LaTeX. Well, that's one of the problems, but it isn't the main one. The main problem I'm having is context switches.

Vim does not work smoothly with external processes. Now, you can (with work) make Vim work with external processes, but it isn't a simple task. If I were writing Vim's report card, it would include a sentence like, "Vim needs to practice playing cooperatively with others."

So when I knew that I would be embarking on a Masters, priority number 1 was finding a text-based flow that would let me write my thesis in plain text and export it for my professor. Enter Emacs.

Why Plaintext?

For the sake of those who stumbled onto this blog post by typing "How to write an academic paper" into your search engine of choice, let me explain briefly why I didn't want to write my paper in Google Docs or Microsoft Word.

Google Docs is a great Word Processor. We have used it at every school I have worked at in the last 5 years to great effect. It is simple, convenient, and reasonably quick.

It also excels at seamless collaboration. While a group of technically minded adults would have no problem collaborating using git, that isn't seamless and it isn't something you can get every single 6th grader in your school to do correctly every time.

But it falls down for long, complex documents which undergo multiple iterations. I took my Masters from a list of possible ideas to a complete paper in a single document (plus a few supporting documents for doing statistical analysis). Not only that, I can easily go back to any previous iteration of that process using git . Finally, I can be certain that the charts and tables in my paper are correct and current every time because they are generated fresh from the actual data every time I export my paper.

Why org-mode?

Now all of this is also possible in LaTeX. In fact, LaTeX is the intermediary step that my paper goes through on the way to the finished product. So why not just write it in LaTeX?

Well, there are a few reasons why I went with org-mode over LaTeX.

Familiarity

I use org-mode every day. My schedule is in org-mode (exported to .ics so that my Google calendar is up to date and visible to my colleagues). My plans, projects, and TODO lists are in org-mode. I take notes during class in org-mode. I write this blog in org-mode.

LaTeX, on the other hand, I have very little experience with. I have used it a little bit for building tests and things, but not enough to be fluent in it. Since I would also be learning a new writing environment setup, I decided to reduce the number of new things I needed to learn.

Integrates with my TODO list

Because my paper is an org-mode document, I can simply put TODO at the beginning of a section header and that section shows up in my org-agenda task list. This allowed me to outline my document, schedule when I planned to research, write, and proofread each section.

Export to multiple formats

LaTeX is usually used to export to PDF. I believe there are ways to export LaTeX to HTML or other formats, but I haven't ever used them. Org-mode exports to almost everything.

The deciding factor for me was the package org-ref. I'll talk more about it now, but allowing me to use the Helm incremental search to filter my library for the exact source I wanted to cite, then insert that citation in the right spot and add it to the bibliography automatically was brilliant.

The Toolchain

Zotero for library management.

While there are definitely Emacs tools for library management, Zotero excels at this particular task. With Zotero, I can drag a PDF sourced from Research Gate or another source, drag and drop it on the Zotero window, and it will autopopulate the bibliographic information. Additionally, it can generate citations for books from just the ISBN number, websites from just the web address, and lots of other sources. Finally, you can install the Zotero plugin for Firefox or Chrome and get citations into Zotero with the click of a button.

Installation

Download Zotero from their website. While you are there, go ahead and sign up for a free account. That way, you can easily transfer your library from computer to computer should you need to. You will also need to download the Zotero Better Bibtex Plugin. You may also want to grab the appropriate plugin for your browser of choice and the Zutilo plugin, but these two tools are optional.

Configuration

Now, you need to set up Zotero so that it creates the .bib file you plan to use for your paper. I have two bibliography files on my computer. A master file located in my home directory and a project-specific file located in my project's folder. The reason for this is two-fold.

  • I want each project to have its own .bib file so that if someone downloads the project from the internet, they have the resources to build the PDF from the github repo.
  • I want a fallback in case a specific project doesn't yet have a .bib file associated with it.

Whether you choose to have a master .bib file for all your projects or individual .bib files for each project, it is important that your .bib files stay in sync with your Zotero library. That's one of the main reasons for downloading the Better Bibtex plugin. One of the features of Better Bibtex is the ability to keep an exported .bib file up to date.

Here's what you need to do in order to get a .bib file in your project directory that stays up to date.

  • In Zotero, click File -> Export Library
  • For the format, be sure to select "Better Bibtex"
  • Make sure you check the box "Keep Updated".
  • For the save file dialog, put your file in your project directory with a reasonable name. I usually use library.bib .

Using Zotero

To get a citation into Zotero, the easiest way is to drag and drop the PDF of the paper or article onto the Zotero window. Zotero will then detect as much of the bibliographic data as possible (for older PDFs without OCR, this may be incomplete) and create a new entry. It will also copy the PDF into a folder in its own directory, so you can safely delete the PDF which you downloaded. Finally, if you have completed the configuration above, it will automatically export that library item into your library.bib file, making it available for searching and citing in Emacs.

Limitations of Zotero

Zotero is excellent for library management. But their notes interface leaves much to be desired for someone who is used to working with the Emacs/org-mode workflow. I would not recommend keeping any notes in Zotero. The whole goal of this toolchain is to use the best tool for each of the jobs. Zotero is the best tool for library management, but it is not the best tool for taking notes about the papers and books in your library.

PDF-Tools for Reading your Papers

Now that you've found some sources for your paper, you need to read them. Not only should you read them, you also need to keep notes on them to simplify writing your paper. For this, pdf-tools and helm-bibtex are excellent resources.

Installation (MacOS)

Installing PDF-tools on a Mac is, sadly, not as straight-forward as it should be. The instructions for doing so are found here .

The part that is missing (or at least potentially unclear is where you should define the pkg_config_path environment variable. This can be defined in your shell rc file (.bash_profile or .zshenv), but if you do that you will need to use exec-path-from-shell to bring it into Emacs. Alternatively, this can be defined inside Emacs, but then it would not be available outside of Emacs. I elected to define it in my .zshenv file, in case I end up needed it elsewhere. In that case, you need the following in your init.elected

This is not included in the big init.el dump at the end because there's another way to get this into Emacs, simply (setenv "PKG_CONFIG_PATH" "/usr/local/Cellar/zlib/1.2.8/lib/pkgconfig:/usr/local/lib/pkgconfig:/opt/X11/lib/pkgconfig") .

When in an org document (any document will do, but typically you would do this in your paper), pressing C-c ] will open the helm-bibtex menu. From here, you'll be presented with a list of all of the items in your library. Use the helm incremental search to find the item you're looking for.

This view is the center of your citation/annotation workflow. From here, you can choose a library item to insert as a citation. You can open it in your PDF Viewer (If you're using pdf-tools as I recommend, that will be Emacs). You can also open an associated notes file, which would open an org file. I originally used this workflow because I could not get pdf-tools working correctly on my Mac. But making highlights and annotations directly into the PDF has the advantage of being transferable to collaborators and other computers which may not have Emacs set up on them. So my workflow right now does not use the notes file.

That said, I did find it useful to be able to write the lit review for each paper directly into an org file and then use M-x org-copy-subtree to put it directly into my paper at the appropriate spot. For now, though, collaborative concerns outweigh that convenience.

Since right now, we are taking notes, we want to open the PDF. So we search for the PDF, press <Tab>, and then <F2>. Assuming you have PDF-tools setup onyour computer, you should now have the PDF in Emacs.

From here, you can read the document and make annotations directly in the PDF. This is the only part of my workflow which requires me to take my hands off the keyboard as pdf-tools interacts with the specific parts of the PDF via mouse events.

But in short, you can highlight a relevant passage and press C-c C-a h to add a highlight. This pops up a mini-buffer where you can add your notes regarding the highlighted section. Alternatively, you can press C-c C-a t to add a text annotation which appears as a small sticky note on the screen. I found those useful for annotating charts and tables.

Org-mode and org-ref for Writing Your Paper

You've found your sources, you've annotated them, and now it's time to write your paper. For this, org-mode is magnificent, especially when coupled with org-ref and helm-bibtex. I suspect the same would be true of ivy's bibtex plugin, but I like helm.

Org-mode rocks, pure and simple. When writing a paper, you use the headers to represent the various sections, headers, and sub-headers of your paper. There are some modifications needed in order to export your work, especially if you're working in the humanities and need to publish in APA6 format. The modifications needed in your init.el are listed below in the code snippet, but you'll need a specific header for your document as well.

This initializes a number of LaTeX options and headers. Let's take them one by one.

  • BIBLIOGRAPHY: This should be the path to the file that Zotero is exporting to. I always point this to the one inside the project directory rather than the master document saved in my home directory.
  • LaTeX_CLASS and LaTeX_CLASS_OPTIONS: Together these define the LaTeX class of the document. They are used as follows in the command: \documentclass{$LaTeX_CLASS}$LaTeX_CLASS_OPTIONS. Note that LaTeX_CLASS_OPTIONS must be inside brackets.
  • LaTeX_HEADER: These make macro calls or set variables which should be done in the header of the LaTeX document (In other words, before the content of the document begins). There are a number of these here. The variables are self-descriptive, but I will describe the packages below.
  • \usepackage{breakcites} : This allow citations to word wrap. It may not be strictly necessary, but I thought it made the paper look nicer.
  • \usepackage{apacite} : This is necessary for apa6 compliant citations.
  • \usepackage{paralist} : Default LaTeX lists take up far too much space. This package reduces that.
  • \let\itemize\compactitem : This replaces the default \item call with \compactitem from paralist
  • \let\description\compactdesc and \let\enumerate\compactenum : Same as above

Next, you write your abstract. Wrap it in #+begin_abstract and #+end_abstract so that the apa6 class can find it.

Finally, add #+LaTeX: \tableofcontents to place your table of contents. Note that this is #+LaTeX , not #+LaTeX_HEADER .

The last thing you should add is at the very end of your paper. You should add the following two lines so that org-ref can build your bibliography.

If your paper should use a different citation style, you should import different packages at the top and use a different bibliographystyle at the end. If you are using the APA6 class, do not put the bibliography in its own header. If you do, your PDF will have two headers for your bibliography. It is annoying that your bibliography goes inside the final header of your Conclusions section, but it is necessary. That may not be the case for other document classes.

Org-ref allows you to manage citations in org-mode. Getting started in org-ref is like getting started in helm or magit. It's intimidating at the beginning, but you don't need to understand all of it in order to handle writing your paper. In fact, much of the setup needed has already been described above (with the exception of init.el requirements).

To use org-ref, you'll press C-c ] as you did when preparing to annotate. Search by author name, article name, publication date... Basically anything in the .bib entry for the article. Should you need to select more than article to cite, C-<space> marks an article for citation. Once you've selected the article(s) you want to cite, you'll can press <Enter> and insert the default citation (which is typically what you want). If you need an alternative citation format (perhaps one without parentheses), pressing <C-u Enter> will get you the list of all possible citation formats. There are lots, I didn't try them all.

Git to Track the Changes to Your Paper

Git is a distributed version control program. It allows you to track different versions of the files in a directory. When writing a paper, it allows you to go back to that version of the lit review that the professor liked, but keep all the work you've done on the methodology. It also lets you back up your paper easily (and for free) to Github, Bitbucket, Gitlab, or other remote git forges. I had to reformat my Mac halfway through my paper. I was able to work on it using a school desktop while waiting to get the Mac back, and sync all the changes back to Bitbucket and the Mac easily when IT was finished.

I used Magit for all my git committing, pushing, etc when writing this paper. But that is beyond the scope of this post. At a later date, I will explain how I use Magit for work.

Exporting with LaTeX

Assuming you have following the instructions to this point, when you are ready to export a version of your paper, you simply press C-c C-e l o and your new version will open. Currently, mine opens in Apple's PDF viewer. I believe that is because the LaTeX command calls an outside process, so it uses the system default PDF reader. In any case, I don't typically annotate my own papers, so that's not a serious issue for me.

Setting it up

Below are all of the relevant parts of my init.el. These differ slightly from the one posted earlier this week because they have been updated and tweaked to reflect changes I made as I reflected on how this process worked for me. For packages which are mandatory for this to work but which don't have any configuration specific to this task, I have simply included the shortest possible configuration for them. If you want to know more about how I use those packages, take a look at the blog entries specific to those packages. This assumes that you use use-package , if you don't, you'll need to heavily adapt what you see here.

Initial Setup

package and use-package setup.

Exec-path-from-shell

As part of their ongoing war against developers, the Captains of industry in Cupertino have designed Macs so that GUI Emacs only ever reads environment variables from the default shell. This is obviously user hostile behavior since their system shell (bash) is from 2007, so nobody should use that shell. This package is designed to work around this.

Org-mode is the thing that brought me to Emacs. There are a lot of customizations here.

This handles the Quality of life part of Org-mode. First, org-bullets beautifies the leading asterisks. Then, we hide the extra asterisks. Finally, we set the global shortcuts for org-store-link, org-agenda, and org-capture.

Academic Paper Writing

Here are the settings I have used for writing academic papers. I write my papers in orgmode, then export them to PDF via LaTeX. This is one of the most fleshed out areas of my dotfiles, in large part because 90% of my Emacs time for the last 6 months has been related to my Master's Thesis in some way.

This sets the default bibtex file. I rarely use this in real projects. Most projects set their own bibliography.

I use org-ref to manage my citations in my papers. This is the section for the support and configuration for org-ref.

This block corrects the way that the TOC is displayed. It is SUPER important for the apa6 class that follows. APA6 has very strong opinions about how the TOC should be displayed, opinions that conflict directly with the default settings for exporting from orgmode.

Add apa6 to the org-latex-classes export for writing academic papers in APA6 format.

This describes the export process from orgmode to LaTeX to PDF.

Linux Install

The shell block below installs the libraries needed to run this on Linux. There is another workflow you have to use on MacOS, but I have not gotten it to work yet

MacOS Install

This assumes you have homebrew installed.

PDF-tools config

PDF-tools allows me to annotate and view pdfs INSIDE emacs. This ties in with helm-bibtex for lit reviews. It's super awesome when it works, but thanks to Apple....

Generated by Emacs 26.x( Org mode 9.x)

Loading metrics

Open Access

A Quick Introduction to Version Control with Git and GitHub

* E-mail: [email protected]

Affiliation Committee on Genetics, Genomics, and Systems Biology, University of Chicago, Chicago, Illinois, United States of America

Affiliation Department of Molecular Biology and Genetics, Cornell University, Ithaca, New York, United States of America

Affiliation Software Carpentry Foundation, Toronto, Ontario, Canada

  • John D. Blischak, 
  • Emily R. Davenport, 
  • Greg Wilson

PLOS

Published: January 19, 2016

  • https://doi.org/10.1371/journal.pcbi.1004668
  • Reader Comments

Table 1

Citation: Blischak JD, Davenport ER, Wilson G (2016) A Quick Introduction to Version Control with Git and GitHub. PLoS Comput Biol 12(1): e1004668. https://doi.org/10.1371/journal.pcbi.1004668

Editor: Francis Ouellette, Ontario Institute for Cancer Research, CANADA

Copyright: © 2016 Blischak et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited

Funding: JDB is supported by National Institutes of Health grant AI087658 awarded to Yoav Gilad. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

“This is part of the PLOS Computational Biology Education collection.”

Introduction to Version Control

Many scientists write code as part of their research. Just as experiments are logged in laboratory notebooks, it is important to document the code you use for analysis. However, a few key problems can arise when iteratively developing code that make it difficult to document and track which code version was used to create each result. First, you often need to experiment with new ideas, such as adding new features to a script or increasing the speed of a slow step, but you do not want to risk breaking the currently working code. One often-utilized solution is to make a copy of the script before making new edits. However, this can quickly become a problem because it clutters your file system with uninformative filenames, e.g., analysis.sh, analysis_02.sh, analysis_03.sh, etc. It is difficult to remember the differences between the versions of the files and, more importantly, which version you used to produce specific results, especially if you return to the code months later. Second, you will likely share your code with multiple lab mates or collaborators, and they may have suggestions on how to improve it. If you email the code to multiple people, you will have to manually incorporate all the changes each of them sends.

Fortunately, software engineers have already developed software to manage these issues: version control. A version control system (VCS) allows you to track the iterative changes you make to your code. Thus, you can experiment with new ideas but always have the option to revert to a specific past version of the code you used to generate particular results. Furthermore, you can record messages as you save each successive version so that you (or anyone else) reviewing the development history of the code is able to understand the rationale for the given edits. It also facilitates collaboration. Using a VCS, your collaborators can make and save changes to the code, and you can automatically incorporate these changes to the main code base. The collaborative aspect is enhanced with the emergence of websites that host version-controlled code.

In this quick guide, we introduce you to one VCS, Git ( https://git-scm.com ), and one online hosting site, GitHub ( https://github.com ), both of which are currently popular among scientists and programmers in general. More importantly, we hope to convince you that although mastering a given VCS takes time, you can already achieve great benefits by getting started using a few simple commands. Furthermore, not only does using a VCS solve many common problems when writing code, it can also improve the scientific process. By tracking your code development with a VCS and hosting it online, you are performing science that is more transparent, reproducible, and open to collaboration [ 1 , 2 ]. There is no reason this framework needs to be limited only to code; a VCS is well-suited for tracking any plain-text files: manuscripts, electronic lab notebooks, protocols, etc.

Version Your Code

The first step is to learn how to version your own code. In this tutorial, we will run Git from the command line of the Unix shell. Thus, we expect readers are already comfortable with navigating a filesystem and running basic commands in such an environment. You can find directions for installing Git for the operating system running on your computer by following one of the links provided in Table 1 . There are many graphical user interfaces (GUIs) available for running Git ( Table 1 ), which we encourage you to explore, but learning to use Git on the command line is necessary for performing more advanced operations and using Git on a remote machine.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

https://doi.org/10.1371/journal.pcbi.1004668.t001

To follow along, first create a folder in your home directory named thesis . Next, download the three files provided in Supporting Information and place them in the thesis directory. Imagine that, as part of your thesis, you are studying the transcription factor CTCF, and you want to identify high-confidence binding sites in kidney epithelial cells. To do this, you will utilize publicly available ChIP-seq data produced by the ENCODE consortium [ 3 ]. ChIP-seq is a method for finding the sites in the genome where a transcription factor is bound, and these sites are referred to as peaks [ 4 ]. process.sh downloads the ENCODE CTCF ChIP-seq data from multiple types of kidney samples and calls peaks ( S1 Data ); clean.py filters peaks with a fold change cutoff and merges peaks from the different kidney samples ( S2 Data ); and analyze.R creates diagnostic plots on the length of the peaks and their distribution across the genome ( S3 Data ).

If you have just installed Git, the first thing you need to do is provide some information about yourself, since it records who makes each change to the file(s). Set your name and email by running the following lines, but replacing “First Last” and “user@domain” with your full name and email address, respectively.

$ git config --global user.name "First Last"

$ git config --global user.email "user@domain"

To start versioning your code with Git, navigate to your newly created directory, ~/thesis . Run the command git init to initialize the current folder as a Git repository (Figs 1 and 2A ). A repository (or repo, for short) refers to the current version of the tracked files as well as all the previously saved versions ( Box 1 ). Only files that are located within this directory (and any subdirectories) have the potential to be version controlled, i.e., Git ignores all files outside of the initialized directory. For this reason, projects under version control tend to be stored within a single directory to correspond with a single Git repository. For strategies on how to best organize your own projects, see Noble, 2009 [ 5 ].

$ cd ~/thesis

analyze.R clean.py process.sh

Initialized empty Git repository in ~/thesis/.git/

thumbnail

To store a snapshot of changes in your repository, first git add any files to the staging area you wish to commit (for example, you’ve updated the process.sh file). Second, type git commit with a message. Only files added to the staging area will be committed. All past commits are located in the hidden .git directory in your repository.

https://doi.org/10.1371/journal.pcbi.1004668.g001

thumbnail

(A) To designate a directory on your computer as a Git repo, type the command git init . This initializes the repository and will allow you to track the files located within that directory. (B) Once you have added a file, follow the git add/commit cycle to place the new file first into the staging area by typing git add to designate it to be committed, and then git commit to take the shapshot of that file. The commit is assigned a commit identifier (d75es) that can be used in the future to pull up this version or to compare different committed versions of this file. (C) As you continue to add and change files, you should regularly add and commit those changes. Here, an additional commit was done, and the commit log now shows two commit identifiers: d75es (from step B) and f658t (the new commit). Each commit will generate a unique identifier, which can be examined in reverse chronological order using git log .

https://doi.org/10.1371/journal.pcbi.1004668.g002

Box 1. Definitions

  • Version Control System (VCS) : (noun) a program that tracks changes to specified files over time and maintains a library of all past versions of those files
  • Git : (noun) a version control system
  • repository (repo) : (noun) folder containing all tracked files as well as the version control history
  • commit : (noun) a snapshot of changes made to the staged file(s); (verb) to save a snapshot of changes made to the staged file(s)
  • stage : (noun) the staging area holds the files to be included in the next commit; (verb) to mark a file to be included in the next commit
  • track : (noun) a tracked file is one that is recognized by the Git repository
  • branch : (noun) a parallel version of the files in a repository ( Box 7 )
  • local : (noun) the version of your repository that is stored on your personal computer
  • remote : (noun) the version of your repository that is stored on a remote server; for instance, on GitHub
  • clone : (verb) to create a local copy of a remote repository on your personal computer
  • fork : (noun) a copy of another user’s repository on GitHub; (verb) to copy a repository; for instance, from one user’s GitHub account to your own
  • merge : (verb) to update files by incorporating the changes introduced in new commits
  • pull : (verb) to retrieve commits from a remote repository and merge them into a local repository
  • push : (verb) to send commits from a local repository to a remote repository
  • pull request : (noun) a message sent by one GitHub user to merge the commits in their remote repository into another user’s remote repository

Now you are ready to start versioning your code ( Fig 1 ). Conceptually, Git saves snapshots of the changes you make to your files whenever you instruct it to. For instance, after you edit a script in your text editor, you save the updated script to your thesis folder. If you tell Git to save a shapshot of the updated document, then you will have a permanent record of the file in that exact version even if you make subsequent edits to the file. In the Git framework, any changes you have made to a script but have not yet recorded as a snapshot with Git reside in the working directory only ( Fig 1 ). To follow what Git is doing as you record the initial version of your files, use the informative command git status.

$ git status

On branch master

Initial commit

Untracked files:

    (use "git add <file>…" to include in what will be committed)

        analyze.R

        clean.py

        process.sh

nothing added to commit but untracked files present (use "git add" to track)

There are a few key things to notice from this output. First, the three scripts are recognized as untracked files because you have not told Git to start tracking anything yet. Second, the word “commit” is Git terminology for a snapshot. As a noun, it means “a version of the code,” e.g., “the figure was generated using the commit from yesterday” ( Box 1 ). This word can also be used as a verb, meaning “to save,” e.g., “to commit a change.” Lastly, the output explains how you can track your files using git add . Start tracking the file process.sh.

$ git add process.sh

And check its new status.

Changes to be committed:

    (use "git rm --cached <file>…" to unstage)

        new file: process.sh

Since this is the first time that you have told Git about the file process.sh , two key things have happened. First, this file is now being tracked, which means Git recognizes it as a file you wish to be version controlled ( Box 1 ). Second, the changes made to the file (in this case the entire file, because it is the first commit) have been added to the staging area ( Fig 1 ). Adding a file to the staging area will result in the changes to that file being included in the next commit, or snapshot, of the code ( Box 1 ). As an analogy, adding files to the staging area is like putting things in a box to mail off, and committing is like putting the box in the mail.

Since this will be the first commit, or first version, of the code, use git add to begin tracking the other two files and add their changes to the staging area as well. Then create the first commit using the command git commit .

$ git add clean.py analyze.R

$ git commit -m "Add initial version of thesis code."

[master (root-commit) 660213b] Add initial version of thesis code.

3 files changed, 154 insertions(+)

create mode 100644 analyze.R

create mode 100644 clean.py

create mode 100644 process.sh

Notice the flag -m was used to pass a message for the commit. This message describes the changes that have been made to the code and is required. If you do not pass a message at the command line, the default text editor for your system will open so you can enter the message. You have just performed the typical development cycle with Git: make some changes, add updated files to the staging area, and commit the changes as a snapshot once you are satisfied with them ( Fig 2 ).

Since Git records all of the commits, you can always look through the complete history of a project. To view the record of your commits, use the command git log . For each commit, it lists the unique identifier for that revision, author, date, and commit message.

commit 660213b91af167d992885e45ab19f585f02d4661

Author: First Last <user@domain>

Date: Fri Aug 21 14:52:05 2015–0500

    Add initial version of thesis code.

The commit identifier can be used to compare two different versions of a file, restore a file to a previous version from a past commit, and even retrieve tracked files if you accidentally delete them.

Now you are free to make changes to the files knowing that you can always revert them to the state of this commit by referencing its identifier. As an example, edit clean.py so that the fold change cutoff for filtering peaks is more stringent. Here is the current bottom of the file.

$ tail clean.py

# Filter based on fold-change over control sample

fc_cutoff = 10

epithelial = epithelial.filter(filter_fold_change, fc = fc_cutoff).saveas()

proximal_tube = proximal_tube.filter(filter_fold_change, fc = fc_cutoff).saveas()

kidney = kidney.filter(filter_fold_change, fc = fc_cutoff).saveas()

# Identify only those sites that are peaks in all three tissue types

combined = pybedtools.BedTool().multi_intersect(

    i = [epithelial.fn, proximal_tube.fn, kidney.fn])

union = combined.filter(lambda x: int(x[3]) = = 3).saveas()

union.cut(range(3)).saveas(data + "/sites-union.bed")

Using a text editor, increase the fold change cutoff from 10 to 20.

fc_cutoff = 20

Because Git is tracking clean.py , it recognizes that the file has been changed since the last commit.

# On branch master

# Changes not staged for commit:

#    (use "git add <file>…" to update what will be committed)

#    (use "git checkout --<file>…" to discard changes in working directory)

#    modified: clean.py

no changes added to commit (use "git add" and/or "git commit -a")

The report from git status indicates that the changes to clean.py are not staged, i.e., they are in the working directory ( Fig 1 ). To view the unstaged changes, run the command git diff .

diff --git a/clean.py b/clean.py

index 7b8c058.76d84ce 100644

--- a/clean.py

+++ b/clean.py

@@ -28,7 +28,7 @@ def filter_fold_change(feature, fc = 1):

    return False

-fc_cutoff = 10

+fc_cutoff = 20

Any lines of text that have been added to the script are indicated with a +, and any lines that have been removed with a -. Here, we altered the line of code that sets the value of fc_cutoff. git diff displays this change as the previous line being removed and a new line being added with our update incorporated. You can ignore the first five lines of output, because they are directions for other software programs that can merge changes to files. If you wanted to keep this edit, you could add clean.py to the staging area using git add and then commit the change using git commit , as you did above. Instead, this time undo the edit by following the directions from the output of git status to “discard changes in the working directory” using the command git checkout .

$ git checkout -- clean.py

Now git diff returns no output, because git checkout undid the unstaged edit you had made to clean.py . This ability to undo past edits to a file is not limited to unstaged changes in the working directory. If you had committed multiple changes to the file clean.py and then decided you wanted the original version from the initial commit, you could replace the argument -- with the commit identifier of the first commit you made above (your commit identifier will be different; use git log to find it). The -- used above was simply a placeholder for the first argument because, by default, git checkout restores the most recent version of the file from the staging area (if you haven’t staged any changes to this file, as is the case here, the version of the file in the staging area is identical to the version in the last commit). Instead of using the entire commit identifier, use only the first seven characters, which is simply a convention, since this is usually long enough for it to be unique.

$ git checkout 660213b clean.py

At this point, you have learned the commands needed to version your code with Git. Thus, you already have the benefits of being able to make edits to files without copying them first, to create a record of your changes with accompanying messages, and to revert to previous versions of the files if needed. Now you will always be able to recreate past results that were generated with previous versions of the code (see the command git tag for a method to facilitate finding specific past versions) and see the exact changes you have made over the course of a project.

Share Your Code

Once you have your files saved in a Git repository, you can share it with your collaborators and the wider scientific community by putting your code online ( Fig 3 ). This also has the added benefit of creating a backup of your scripts and provides a mechanism for transferring your files across multiple computers. Sharing a repository is made easier if you use one of the many online services that host Git repositories ( Table 1 ), e.g., GitHub. Note, however, that any files that have not been tracked with at least one commit are not included in the Git repository, even if they are located within the same directory on your local computer (see Box 2 for advice on the types of files that should not be versioned with Git and Box 3 for advice on managing large files).

thumbnail

(A) On your computer, you commit to a Git repository (commit d75es). (B) On GitHub, you create a new repository called thesis. This repository is currently empty and not linked to the repo on your local machine. (C) The command git remote add connects your local repository to your remote repository. The remote repository is still empty, however, because you have not pushed any content to it. (D) You send all the local commits to the remote repository using the command git push . Only files that have been committed will appear in the remote repository. (E) You repeat several more rounds of updating scripts and committing on your local computer (commit f658t and then commit xv871). You have not yet pushed these commits to the remote repository, so only the previously pushed commit is in the remote repo (commit d75es). (F) To bring the remote repository up to date with your local repository, you git push the two new commits to the remote repository. The local and remote repositories now contain the same files and commit histories.

https://doi.org/10.1371/journal.pcbi.1004668.g003

Box 2. What Not to Version Control

You can version control any file that you put in a Git repository, whether it is text-based, an image, or a giant data file. However, just because you can version control something, does not mean you should . Git works best for plain, text-based documents such as your scripts or your manuscript if written in LaTeX or Markdown. This is because for text files, Git saves the entire file only the first time you commit it and then saves just your changes with each commit. This takes up very little space, and Git has the capability to compare between versions (using git diff ). You can commit a non-text file, but a full copy of the file will be saved in each commit that modifies it. Over time, you may find the size of your repository growing very quickly. A good rule of thumb is to version control anything text-based: your scripts or manuscripts if they are written in plain text. Things not to version control are large data files that never change, binary files (including Word and Excel documents), and the output of your code.

In addition to the type of file, you need to consider the content of the file. If you plan on sharing your commits publicly using GitHub, ensure you are not committing any files that contain sensitive information, such as human subject data or passwords.

To prevent accidentally committing files you do not wish to track, and to remove them from the output of git status , you can create a file called .gitignore . In this file, you can list subdirectories and/or file patterns that Git should ignore. For example, if your code produced log files with the file extension .log , you could instruct Git to ignore these files by adding *.log to .gitignore . In order for these settings to be applied to all instances of the repository, e.g., if you clone it onto another computer, you need to add and commit this file.

Box 3. Managing Large Files

Many biological applications require handling large data files. While Git is best suited for collaboratively writing small text files, nonetheless, collaboratively working on projects in the biological sciences necessitates managing this data.

The example analysis pipeline in this tutorial starts by downloading data files in BAM format that contain the alignments of short reads from a ChIP-seq experiment to the human genome. Since these large, binary files are not going to change, there is no reason to version them with Git. Thus, hosting them on a remote http (as ENCODE has done in this case) or ftp site allows each collaborator to download it to her machine as needed, e.g., using wget , curl , or rsync . If the data files for your project are smaller, you could also share them via services like Dropbox ( www.dropbox.com ) or Google Drive ( https://www.google.com/drive/ ).

However, some intermediate data files may change over time, and the practical necessity to ensure all collaborators are using the same data set may override the advice to not put code output under version control, as described in Box 2 . Again, returning to the ChIP-seq example, the first step calling the peaks is the most difficult computationally because it requires access to a Unix-like environment and sufficient computational resources. Thus, for collaborators that want to experiment with clean.py and analyze.R without having to run process.sh , you could version the data files containing the ChIP-seq peaks (which are in BED format). But since these files are larger than those typically used with Git, you can instead use one of the solutions for versioning large files within a Git repository without actually saving the file with Git, e.g., git-annex ( https://git-annex.branchable.com/ ) or git-fat ( https://github.com/jedbrown/git-fat/ ). Recently, GitHub has created their own solution for managing large files called Git Large File Storage (LFS) ( https://git-lfs.github.com/ ). Instead of committing the entire large file to Git, which quickly becomes unmanageable, it commits a text pointer. This text pointer refers to a specific file saved on a remote GitHub server. Thus, when you clone a repository, it only downloads the latest version of the large file. If you check out an older version of the repository, it automatically downloads the old version of the large file from the remote server. After installing Git LFS, you can manage all the BED files with one command: git lfs track "*.bed" . Then you can commit the BED files just like your scripts, and they will automatically be handled with Git LFS. Now, if you were to change the parameters of the peak calling algorithm and re-run process.sh , you could commit the updated BED files, and your collaborators could pull the new versions of the files directly to their local Git repositories.

Below, we focus on the technical aspects of sharing your code. However, there are also other issues to consider when deciding if and how you are going to make your code available to others. For quick advice on these subjects, see Box 4 on how to license your code, Box 5 on concerns about being scooped, and Box 6 on the increasing trend of journals to institute sharing policies that require authors to deposit code in a public archive upon publication.

Box 4. Choosing a License

Putting software and other material in a public place is not the same as making it publicly usable. In order to do that, the authors must also add a license, since copyright laws in some jurisdictions require people to treat anything that isn’t explicitly open as being proprietary.

While dozens of open licenses have been created, the two most widely used are the GNU Public License (GPL) and the MIT/BSD family of licenses. Of these, the MIT/BSD-style licenses put the fewest requirements on re-use, and thereby make it easier for people to integrate your software into their projects.

For an excellent short discussion of these issues, and links to more information, see Jake Vanderplas’s blog post from March 2014 at http://www.astrobetter.com/blog/2014/03/10/the-whys-and-hows-of-licensing-scientific-code/ . For a more in-depth discussion of the legal implications of different licenses, see Morin et al., 2012 [ 6 ].

Box 5. Being Scooped

One concern scientists frequently have about putting work in progress online is that they will be scooped, e.g., that someone will analyze their data and publish a result that they themselves would have, but hadn’t yet. In practice, though, this happens rarely, if at all: in fact, the authors are not aware of a single case in which this has actually happened, and would welcome pointers to specific instances. In practice, it seems more likely that making work public early in something like a version control repository, which automatically adds timestamps to content, will help researchers establish their priority.

Box 6. Journal Policies

Sharing data, code, and other materials is quickly moving from “desired” to “required.” For example, PLOS’s sharing policy ( http://journals.plos.org/plosone/s/materials-and-software-sharing ) already says, “We expect that all researchers submitting to PLOS will make all relevant materials that may be reasonably requested by others available without restrictions upon publication of the work.” Its policy on software is more specific:

We expect that all researchers submitting to PLOS submissions in which software is the central part of the manuscript will make all relevant software available without restrictions upon publication of the work. Authors must ensure that software remains usable over time regardless of versions or upgrades…

It then goes on to specify that software must be based on open source standards, and that it must be put in an archive which is large or long-lived. Granting agencies, philanthropic foundations, and other major sponsors of scientific research are all moving in the same direction, and, to our knowledge, none has relaxed or reduced sharing requirements in the last decade.

To begin using GitHub, you will first need to sign up for an account. For the code examples in this tutorial, you will need to replace username with the username of your account. Next, choose the option to “Create a new repository” ( Fig 3B , see https://help.github.com/articles/create-a-repo/ ). Call it “thesis,” because that is the directory name containing the files on your computer, but note that you can give it a different name on GitHub if you wish. Also, now that the code will exist in multiple places, you need to learn some more terminology ( Box 1 ). A local repository refers to code that is stored on the machine you are using, e.g., your laptop; whereas a remote repository refers to the code that is hosted online. Thus, you have just created a remote repository.

Now you need to send the code on your computer to GitHub. The key to this is the URL that GitHub assigns your newly created remote repository. It will have the form https://github.com/username/thesis.git (see https://help.github.com/articles/cloning-a-repository/ ). Notice that this URL is using the HTTPS protocol, which is the quickest to begin using. However, it requires you to enter your username and password when communicating with GitHub, so you’ll want to consider switching to the SSH protocol once you are regularly using Git and GitHub (see https://help.github.com/articles/generating-ssh-keys/ for directions). In order to link the local thesis repository on your computer to the remote repository you just created, in your local repository, you need to tell Git the URL of the remote repository using the command git remote add ( Fig 3C ).

$ git remote add origin https://github.com/username/thesis.git

The name “origin” is a bookmark for the remote repository so that you do not have to type out the full URL every time you transfer your changes (this is the default name for a remote repository, but you could use another name if you like).

Send your code to GitHub using the command git push ( Fig 3D ).

$ git push origin master

You first specify the remote repository, “origin.” Second, you tell Git to push to the “master” copy of the repository—we will not go into other options in this tutorial, but Box 7 discusses them briefly.

Box 7. Branching

Do you ever make changes to your code, but are not sure you will want to keep those changes for your final analysis? Or do you need to implement new features while still providing a stable version of the code for others to use? Using Git, you can maintain parallel versions of your code that you can easily bounce between while you are working on your changes. You can think of it like making a copy of the folder you keep your scripts in, so that you have your original scripts intact but also have the new folder where you make changes. Using Git, this is called branching, and it is better than separate folders because (1) it uses a fraction of the space on your computer, (2) it keeps a record of when you made the parallel copy (branch) and what you have done on the branch, and (3) there is a way to incorporate those changes back into your main code if you decide to keep your changes (and a way to deal with conflicts). By default, your repository will start with one branch, usually called “master.” To create a new branch in your repository, type git branch new_branch_name . You can see what branches a current repository has by typing git branch , with the branch you are currently in being marked by a star. To move between branches, type git checkout branch_to_move_to . You can edit files and commit them on each branch separately. If you want to combine the changes in your new branch with the master branch, you can merge the branches by typing git merge new_branch_name while in the master branch.

Pushing to GitHub also has the added benefit of backing up your code in case anything were to happen to your computer. Also, it can be used to manually transfer your code across multiple machines, similar to a service like Dropbox ( www.dropbox.com ) but with the added capabilities and control of Git. For example, what if you wanted to work on your code on your computer at home? You can download the Git repository using the command git clone .

$ git clone https://github.com/username/thesis.git

By default, this will download the Git repository into a local directory named “thesis.” Furthermore, the remote “origin” will automatically be added so that you can easily push your changes back to GitHub. You now have copies of your repository on your work computer, your GitHub account online, and your home computer. You can make changes, commit them on your home computer, and send those commits to the remote repository with git push , just as you did on your work computer.

Then the next day back at your work computer, you could update the code with the changes you made the previous evening using the command git pull .

$ git pull origin master

This pulls in all the commits that you had previously pushed to the GitHub remote repository from your home computer. In this workflow, you are essentially collaborating with yourself as you work from multiple computers. If you are working on a project with just one or two other collaborators, you could extend this workflow so that they could edit the code in the same way. You can do this by adding them as Collaborators on your repository (Settings -> Collaborators -> Add collaborator; see https://help.github.com/articles/adding-collaborators-to-a-personal-repository/ ). However, with projects with lots of contributors, GitHub provides a workflow for finer-grained control of the code development.

With the addition of a GitHub account and a few commands for sending and receiving code, you can now share your code with others, transfer your code across multiple machines, and set up simple collaborative workflows.

Contribute to Other Projects

Lots of scientific software is hosted online in Git repositories. Now that you know the basics of Git, you can directly contribute to developing the scientific software you use for your research ( Fig 4 ). From a small contribution like fixing a typo in the documentation to a larger change such as fixing a bug, it is empowering to be able to improve the software used by yourself and many other scientists.

thumbnail

We would like you to add an empty file that is named after your GitHub username to the repo used to write this manuscript. (A) Using your internet browser, navigate to https://github.com/jdblischak/git-for-science . (B) Click on the “Fork” button to create a copy of this repo on GitHub under your username. (C) On your computer, type git clone https://github.com/username/git-for-science.git , which will create a copy of git-for-science on your local machine. (D) Navigate to the readers directory by typing cd git-for-science/readers/ . Create an empty file that is titled with your GitHub username by typing touch username.txt . Commit that new file by adding it to the staging area ( git add username.txt ) and committing with a message ( git commit -m "Add username to directory of readers." ). Note that your commit identifier will be different than what is shown here. (E) You have committed your new file locally, and the next step is to push that new commit up to the git-for-science repo under your username on GitHub. To do so, type git push origin master . (F) To request to add your commits to the original git-for-science repo, issue a pull request from the git-for-science repo under your username on GitHub. Once your Pull Request is reviewed and accepted, you will be able to see the file you committed with your username in the original git-for-science repository.

https://doi.org/10.1371/journal.pcbi.1004668.g004

When contributing to a larger project with many contributors, you will not be able to push your changes with git push directly to the project’s remote repository. Instead, you will first need to create your own remote copy of the repository, which on GitHub is called a fork ( Box 1 ). You can fork any repository on GitHub by clicking the button “Fork” on the top right of the page (see https://help.github.com/articles/fork-a-repo/ ).

Once you have a fork of a project’s repository, you can clone it to your computer and make changes just like a repository you created yourself. As an exercise, you will add a file to the repository that we used to write this paper. First, go to https://github.com/jdblischak/git-for-science and choose the “Fork” option to create a git-for-science repository under your GitHub account ( Fig 4B ). In order to make changes, download it to your computer with the command git clone from the directory you wish the repo to appear in ( Fig 4C ).

$ git clone https://github.com/username/git-for-science.git

Now that you have a local version, navigate to the subdirectory readers and create a text file named as your GitHub username ( Fig 4D ).

$ cd git-for-science/readers

$ touch username.txt

Add and commit this new file ( Fig 4D ), and then push the changes back to your remote repository on GitHub ( Fig 4E ).

$ git add username.txt

$ git commit -m "Add username to directory of readers."

Currently, the new file you created, readers/username.txt , only exists in your fork of git-for-science. To merge this file into the main repository, send a pull request using the GitHub interface (Pull request -> New pull request -> Create pull request; Fig 4F ; see https://help.github.com/articles/using-pull-requests/ ). After the pull request is created, we can review your change and then merge it into the main repository. Although this process of forking a project’s repository and issuing a pull request seems like a lot of work to contribute changes, this workflow gives the owner of a project control over what changes get incorporated into the code. You can have others contribute to your projects using the same workflow.

The ability to use Git to contribute changes is very powerful because it allows you to improve the software that is used by many other scientists and also potentially shape the future direction of its development.

Git, albeit complicated at first, is a powerful tool that can improve code development and documentation. Ultimately, the complexity of a VCS not only gives users a well-documented “undo” button for their analyses, but it also allows for collaboration and sharing of code on a massive scale. Furthermore, it does not need to be learned in its entirety to be useful. Instead, you can derive tangible benefits from adopting version control in stages. With a few commands ( git init , git add , git commit ), you can start tracking your code development and avoid a file system full of copied files ( Fig 2 ). Adding a few additional commands ( git push , git clone , git pull ) and a GitHub account, you can share your code online, transfer your changes across machines, and collaborate in small groups ( Fig 3 ). Lastly, by forking public repositories and sending pull requests, you can directly improve scientific software ( Fig 4 ).

We collaboratively wrote the article in LaTeX ( http://www.latex-project.org/ ) using the online authoring platform Authorea ( https://www.authorea.com ). Furthermore, we tracked the development of the document using Git and GitHub. The Git repo is available at https://github.com/jdblischak/git-for-science , and the rendered LaTeX article is available at https://www.authorea.com/users/5990/articles/17489 .

Supporting Information

S1 data. process.sh..

This Bash script downloads the ENCODE CTCF ChIP-seq data from multiple types of kidney samples and calls peaks. See https://github.com/jdblischak/git-for-science/tree/master/code for instructions on running it.

https://doi.org/10.1371/journal.pcbi.1004668.s001

S2 Data. clean.py.

This Python script filters peaks with a fold change cutoff and merges peaks from the different kidney samples. See https://github.com/jdblischak/git-for-science/tree/master/code for instructions on running it.

https://doi.org/10.1371/journal.pcbi.1004668.s002

S3 Data. analyze.R.

This R script creates diagnostic plots on the length of the peaks and their distribution across the genome. See https://github.com/jdblischak/git-for-science/tree/master/code for instructions on running it.

https://doi.org/10.1371/journal.pcbi.1004668.s003

  • View Article
  • PubMed/NCBI
  • Google Scholar

write thesis git

Enough R to write a thesis

From raw data to finished thesis

August 15, 2024

The biostats books, apps, and tutorials are written to give you enough skills to write a thesis.

The books emphasis reproducible research.

Biostats books

Working in r.

Working in R logo

Learn how to import, manipulate and visualise data with our working in R book. After an introduction to R, this book has a tidyverse flavour, showing how to manipulate data with dplyr and make publishable plots with ggplot2 . It includes lots of exercises to hone your skills.

The data life-cycle

Data life-cycle book logo

Your data are precious. Learn how to take care of them with our data life-cycle book.

Statistics in R

Statistics logo

Coming soon!

Reproducible documents with R

Quarto markdown logo

Learn how to write reproducible documents (anything from a course assignment to a thesis or manuscript) in quarto : no more copy-paste nightmares. Quarto is the successor to R markdown (our earlier R markdown book is here). All the biostats books were written in R markdown or quarto; you can see the source code in our repo on GitHub.

Git and GitHub

Git and GitHub logo

Learn how to use version control with our step-by-step guide to setting up and using git and GitHub in RStudio.

Enough targets to Write a Thesis

Targets logo

Learn how to run data analysis pipeline for reproducible and scalable workflows. Our guide to using the targets package will show you how.

Writing an R package

Writing an R package logo

Want to make your code into an R package? Our guide to writing an R package will show you how.

Biostats apps and tutorials

We have developed interactive learnr tutorials for

  • naming objects
  • dates and times
  • string manipulation

and shiny apps to explore some statistical concepts.

These can be installed from biostats apps and learnr tutorials with

The Biostats team

These books are a collaboration between the Department of Biological Sciences , University of Bergen and the Department of Mathematical Sciences , University of Trondheim .

  • Dr Aud Halbritter
  • Dr Josh Lynn
  • Dr Emily G. Simmonds
  • Dr Jonathan Soulé
  • Dr Richard J. Telford

This is BioCeed product.

Source Code

Stack Exchange Network

Stack Exchange network consists of 183 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

How should I reference my Github repository with materials for my paper?

I am writing a paper describing some of the research I have done. As part of my work I have developed an open-source library and made it available on Github.

How should I link to it?

Should I cite it in the bibliography or make a footnote with the link to the software?

Piotr Migdal's user avatar

  • 2 See this paper ( published full text and arXiv version ) by Kruse & Agol, they linked their repository on GitHub in the acknowledgments section. –  giordano Commented May 7, 2014 at 8:46
  • 6 Related: How do you cite a Github repository? –  E.P. Commented May 7, 2014 at 18:38

5 Answers 5

Such resources, especially if they are a supplementary to the paper, i.e. in some sense a part of it, should be referenced in a footnote and not in bibliography.

Do include not only the URL but also a short description; and do try to keep that URL valid - once you publish that link, it's frozen forever.

It also helps to include the opposite citation. In the readme file of that library, include full citation information of your paper. This will allow others to gain extra information about the methods, and also give you citations if/when others build upon your work.

Some publishers also support binary attachments for supplementary information. If they do, you should use it - prepare a package of the current stable version and upload it there. It allows for reproducability, as a specific version is referenced which relates to the actual paper, and not some improvements done in 2020 that change everything; and it also attaches all relevant information together with the paper at the publisher's site, which will stay valid even if that github repository goes away for whatever reason.

Peteris's user avatar

  • 2 Github may disappear in the future, you may close your account. I think it is the journal's job to keep a snapshot of the code for the record, plus a link to the repository for folks to get updates. –  Davidmh Commented May 7, 2014 at 10:20
  • 1 This depends. I develop software as a mathematician and in all the papers I have been a a co-author, we have cited the software (it's widely used in our community). I would probably cite the software on your university page, as this would be fairly stable and around, unless you are not tenured, and then on your site, link to the latest version of the library on github (see homepages.math.uic.edu/~jan/download.html ) as an example. The other option is to write a 'software' paper and find an appropriate journal to publish the library, at which point you can cite that paper from then on. –  nagniemerg Commented May 7, 2014 at 18:23
  • 1 @Davidmh In an ideal world it might be the journal's job, but I've definitely had code I wanted to share for an article where the journal wasn't going to or couldn't handle archiving it. –  Fomite Commented May 7, 2014 at 18:57

In previous papers, I've used something like (source code available at github.com/fomite/brilliantwork) when describing the software methods used.

However Figshare now allows you to directly import a repo from GitHub, which will give you a DOI you can reference as a citation. This also provides a benefit for having a "snapshot" of the repo at the time of publication, for repositories that will continue to be worked on. That, plus the ability for me and other people to cite the repo directly (and thus be able to get some traditional citation metrics to show the impact of the software), is what I'll be doing in the future.

Fomite's user avatar

  • 5 +1 for snapshots; I would say in general, when citing, be very sure you give a version. git provides hashes corresponding to particular commits/versions: That's what I'd include in a citation (having not used Figshare). –  Matthew G. Commented May 6, 2014 at 18:40
  • @MatthewG Agreed as to the version. The Figshare upload process will actually create an entirely standalone code base that will be frozen in time. –  Fomite Commented May 6, 2014 at 18:45
  • I was going to post the same thing for the same reasons. –  David Z Commented May 6, 2014 at 20:53
  • 1 An alternative view on Figshare . It doesn't mean that Figshare is bad but one does need to be careful with the fine print when uploading there. –  E.P. Commented May 8, 2014 at 10:12

A citation is a reference to a research object. Prior to the DOI, this reference contained the information somebody would need to physically locate the research object, whether it was a book, journal article, or dissertation. Although the jury is still out on how to provide general references to digital research objects, a Git repository contains a canonical reference, the SHA1 hash of the commit.

If you would like to refer to your Git repository in such a way that it is easy to locate in the future, you should provide not only the URL where it is now, but also the short name for the repository, the lead author[s], and the SHA1 hash of the commit that you produced your results with.

Here is an example URL from GitHub that contains the commit: https://github.com/hashdist/hashstack/commit/4c72950a0f6eb9cc1cf63cd640f3e6b82c9ce9c0

I don't recommend uploading your code to Figshare until they've fixed their ability to accept code licenses.

Update 14 May 2014

GitHub and ZENODO have partnered together to upload code for a DOI under a flexible license. Obtaining a DOI for your code should be as simple as following the instructions on the GitHub Guide to Making Your Code Citable . Here's an overview of the instructions:

  • Choose your repository
  • Login to ZENODO
  • Pick the repository you want to archive
  • Check repository settings
  • Create a new release
  • Check everything has worked

It's really that easy. There's no current reason not to use this as a default approach for citing your software.

Aron Ahmadia's user avatar

The specifics depend on your field, and I think most areas have very loose guidelines as to what to do in these situations. The only hard-and-fast rule is that you need to get it past your reviewers and your editors, who will tell you if the style is inappropriate. Other than that, the most important thing is that you yourself are happy that the citation is getting your GitHub repo the most visibility possible. Finally, you should consider your paper from the perspective of readers that may want to use and cite your code, and who will naturally look to your writing for how to do that.

In general, I would recommend citing the code in the bibliography, as one more reference. The advantage of this is that your article's references will be listed and indexed separately, and (with luck) this will register as links pointing towards your repo, which will become more of an advantage if more people cite it. Such a citation should have

  • the name of the programme,
  • the URL of the repository, and
  • a clear indication of the version cited and its date.

An example citation is

G. A. Worth, M. H. Beck, A. Jackle, and H.-D. Meyer. The MCTDH Package, Version 8.2, (2000), University of Heidelberg, Heidelberg, Germany. H.-D. Meyer, Version 8.3 (2002), Version 8.4 (2007). See http://mctdh.uni-hd.de

You should also describe the code within the text when you first cite the code, and provide a complete enough description within the paper that readers do not need to go read any additional information to continue reading your paper, because it needs to read like a single, coherent piece of work.

Alternatively, you can cite it in a footnote, indicating the name of the code and its location. This would be a good place to include the description if it is brief. Another choice is to do this in the acknowledgements, as giordano points out . However, I think these make your code less discoverable to both humans and search engines.

As I mentioned before, you should mould your citation of the code in the way that you'd like others to cite it. It is also desirable that you include, within the pages describing code online, a description of how you want people to cite the code. Some examples are GAMESS UK , MCTDH and MOLCAS , or the ones in this question . Having such a description will strengthen your position should referees or editors not like your preferred style. You set the terms on which your code gets cited!

Finally, as others have mentioned, you should make sure that the URL you point to is stable, as you will be unable to change the link in the published paper once it goes out. This is a separate question altogether, and there are a number of ways to do this - including supplementary information to the article itself, separate repositories for academic code, and of course GitHub itself - and you should strive for the most stable solution possible. Is it likely that the repository will someday get closed or moved? If so, you should consider alternatives.

Community's user avatar

In addition to referencing the repository within the paper, you should also make sure that the link between the paper and repository is part of the paper metadata. Specifically, if you are depositing the paper on arXiv , you can use the latest feature that integrates with Papers with Code to record the link between your paper and its implementation.

a3nm's user avatar

You must log in to answer this question.

Not the answer you're looking for browse other questions tagged citations software ..

  • Featured on Meta
  • Bringing clarity to status tag usage on meta sites
  • We've made changes to our Terms of Service & Privacy Policy - July 2024
  • Announcing a change to the data-dump process

Hot Network Questions

  • If Venus had a sapient civilisation similar to our own prior to global resurfacing, would we know it?
  • What's wrong with my app authentication scheme?
  • Is my encryption format secure?
  • Why isn't openvpn picking up my new .conf file?
  • What does it mean to have a truth value of a 'nothing' type instance?
  • DIN Rail Logic Gate
  • Using illustrations and comics in dissertations
  • How to satisfy the invitation letter requirement for Spain when the final destination is not Spain
  • Whats the purpose of slots in wings?
  • Why was I was allowed to bring 1.5 liters of liquid through security at Frankfurt Airport?
  • A short story where a space pilot has a device that sends the ship back in time just before losing a space battle. He is duplicated by accident
  • Does the expansion of space imply anything about the dimensionality of the Universe?
  • Using the higrī date instead of the Gregorian date
  • Are there jurisdictions where an uninvolved party can appeal a court decision?
  • Function for listing processes holding a specified file open
  • What are these commands in the code?
  • What is a word/phrase that best describes a "blatant disregard or neglect" for something, but with the connotation of that they should have known?
  • Why HIMEM was implemented as a DOS driver and not a TSR
  • I submitted a paper and later realised one reference was missing, although I had written the authors in the body text. What could happen?
  • Unexpected behaviour during implicit conversion in C
  • Name of a YA book about a girl who undergoes secret experimental surgery that makes her super smart
  • What is the connection between a regular language's pumping number, and the number of states of an equivalent deterministic automaton?
  • Simple JSON parser in lisp
  • Is Cohort level hard capped?

write thesis git

Projects with this topic

  • Jupyter Notebook
  • Objective-C
  • Updated date
  • Last created
  • Name, descending
  • Oldest updated
  • Oldest created
  • Hide archived projects
  • Show archived projects
  • Show archived projects only

write thesis git

Void / Bachelor Thesis

Empirical analysis of the performance between Node and Deno using cloud computing.

Jesse Knight / ut-thesis

University of Toronto thesis class for LaTeX

write thesis git

Thomas AUZINGER / Thesis Template - TU Wien Informatics

The vutinfth document class is a LaTeX2e-based template for all theses written at TU Wien Informatics. DOWNLOAD with the button on the right. READ the README.txt to start.

sysu-gitlab / thesis-template / better-thesis

基于 typst 的中山大学论文模板,可在 typst.universe 中直接使用

write thesis git

Thomas AUZINGER / Thesis Template - ISTA

The istaustriathesis document class is a LaTeX2e-based template for theses written at the Institute of Science and Technology Austria (ISTA). DOWNLOAD with the button on the right. READ the README.txt to start.

Henrique / CoppeTex

"A LaTeX toolkit for writing thesis and dissertations."

Deleuze Romain / 3IR Graduation Work

3IR Graduation Work repository containing documentation, daily report, Gantt Diagram and thesis.

write thesis git

Nils B. / Thesis Template for Docker

University: Hochschule Darmstadt

Author of the Fork: Nils B.

Original repository: https://github.com/mbredel/thesis-template

A thesis template using Latex in Docker.

Philipp Niedermayer / Thesis Template

LaTeX template for theses

write thesis git

Richard Anyalai / Bachelors thesis

My bachelors thesis at the University of Szeged

Matthias Gehlert / bachelorthesis

This repository is deprecated (as of 2023), please use the following version: https://codeberg.org/matthiasgehlert/bachelorthesis

A bachelor thesis about the aggregating of conditional effects in logistic regression

write thesis git

Alasdair Warwick / oxforddown-gitlab-ci

A thesis template based on bookdown/oxforddown.

write thesis git

Davide Peressoni / UniPD Thesis template

Template tesi UniPD in LaTeX

write thesis git

Antal Spector-Zabusky / UPenn PhD thesis skeleton

A LaTeX skeleton for a well-formatted UPenn PhD thesis, pulled from my CS thesis submitted in Spring 2021. Feel free to use it for anything and to contact me with any questions about it. I can't promise it's perfect, but hey, they accepted my thesis!

MSEM / thesis-template

Thesis build system to write your thesis in markdown and generate a pdf. It can run on GitLab CI or locally using pandoc with plantuml and mermaid support.

Tim / PropertyBasedDiffCond Evaluation

Experimental data and artifact data for project on Property Based Difference Verification with Conditions.

write thesis git

Rafał Osadnik / Engineering thesis

My bachelor of engineering (technical physics) thesis LaTeX contents in Polish.

write thesis git

Ulices / UPIITA-LaTeX-formats

This is a compilation of a few LaTeX formats for Report, Slides, Protocol and Thesis for UPIITA

write thesis git

Emanuele Petriglia / Bachelor degree thesis

BSc degree thesis: design and development of APIs in Go for a domain-specific messaging system on SeismoCloud

Pirasbro / Computer Science - Bachelors Thesis / Main Code

Computer Science - Bachelor's Thesis Universidade Federal de Itajubá - UNIFEI

This is the main repository.

write thesis git

Read The Diplomat , Know The Asia-Pacific

  • Central Asia
  • Southeast Asia

Environment

  • Asia Defense
  • China Power
  • Crossroads Asia
  • Flashpoints
  • Pacific Money
  • Tokyo Report
  • Trans-Pacific View

Photo Essays

  • Write for Us
  • Subscriptions

Would You Pay Someone To Write Your University Thesis?

Recent features.

What’s Driving Lithuania’s Challenge to China?

What’s Driving Lithuania’s Challenge to China?

Hun Manet: In His Father’s Long Shadow

Hun Manet: In His Father’s Long Shadow

Afghanistan: A Nation Deprived, a Future Denied

Afghanistan: A Nation Deprived, a Future Denied

In Photos: Life of IDPs in Myanmar’s Rakhine State

In Photos: Life of IDPs in Myanmar’s Rakhine State

Indian Government’s Intensifying Attack on Scientific Temperament Worries Scientists

Indian Government’s Intensifying Attack on Scientific Temperament Worries Scientists

Beyond Tariffs: Unveiling the Geopolitics of Electric Vehicles Through Supply Chains

Beyond Tariffs: Unveiling the Geopolitics of Electric Vehicles Through Supply Chains

In Photos: Bangladesh After Hasina Fled

In Photos: Bangladesh After Hasina Fled

First Known Survivor of China’s Forced Organ Harvesting Speaks Out 

First Known Survivor of China’s Forced Organ Harvesting Speaks Out 

Nuclear Shadows Over South Asia: Strategic Instabilities in the China-India-Pakistan Triad

Nuclear Shadows Over South Asia: Strategic Instabilities in the China-India-Pakistan Triad

Securing America’s Critical Minerals: A Policy Priority Conundrum

Securing America’s Critical Minerals: A Policy Priority Conundrum

The Geopolitics of Cambodia’s Funan Techo Canal

The Geopolitics of Cambodia’s Funan Techo Canal

The Killing of Dawa Khan Menapal and the Fall of Afghanistan&#8217;s Republic

The Killing of Dawa Khan Menapal and the Fall of Afghanistan’s Republic

Asean beat  |  society  |  southeast asia.

In Indonesia, the use of “joki,” or writers-for-hire, is a long-standing – and mostly normalized – practice among university students.

Would You Pay Someone To Write Your University Thesis?

The main gate of Yogyakarta State University in Yogyakarta, Indonesia, January 10, 2022.

Would you pay someone to write your university thesis?

For some, the answer will be an immediate “no” for a range of reasons, either moral, legal or practical. For others, it may be that such a “service” is simply unthinkable, completely unheard of, or prohibitively expensive.

In Indonesia however, a decades-old thriving business known as joki or writers-for-hire, where fellow students or recent graduates write other students’ theses, dissertations, extended essays or classroom assignments for a low fee, is back in the news.

While joki is nothing new in Indonesia, it has become a revived talking point recently following a viral video posted on X in July by the founder of the sociopolitics media platform What Is Up Indonesia, Abigail Limuria.

In the video , Limuria named a number of issues already facing the Indonesian education system, including teacher welfare, the curriculum, and teaching quality, continuing to say that the practice of joki was only adding to the existing problems.

“What makes me mindblown is that so many people don’t realize that this is wrong,” she said. “Come on guys, how can you not be aware that this is deception?”

The video has been viewed some 11 million times, and prompted discussion from students, academics and even some of the joki themselves, defending the practice and lamenting the lack of other jobs in Indonesia.

Undoubtedly, the concept of writers-for-hire comes from a confluence of factors.

One is the aforementioned saturated job market in Indonesia, which means that students and fresh graduates need to find creative ways to make money. According to Indonesia’s Central Bureau of Statistics, the unemployment rate in February 2024 stood at 4.82 percent .

Another issue is obviously overstretched and underpaid university lecturers, who often struggle with large class sizes of hundreds of students, and overwhelming administrative duties that leave them with little time to clamp down on issues like the increasing use of AI and plagiarism.

Thirdly, university students in Indonesia have become accustomed to the concept of joki , with many failing to see it as a dishonest practice, but rather a shortcut that everyone takes.

Why write your own thesis, when all your friends have hired someone else to write theirs?

Students on X, and some academics, also blamed the lack of support for students on the decision to simply pay someone else to do their work.

In particular, some highlighted the failure of many Indonesian institutions to teach students how to accurately research a topic and structure an academic thesis to reflect their findings – again being too overburdened by teaching and marking that they only have time to teach their core syllabus.

Perhaps one of the main reasons the practice flourishes is that there is really no stigma attached to it – something demonstrated by the way in which scribes-for-hire openly promote their services on social media platforms and e-commerce marketplaces.

Yet not only is joki unethical, it is also against the law – a fact confirmed by the Indonesian Ministry of Education’s X account , which replied to Limuria’s viral video.

“The academic community is prohibited from using jockeys (other people’s services) to complete assignments and scientific work because it violates ethics and the law. This is a form of plagiarism which is prohibited in Law No. 20/2003 concerning the National Education System,” the tweet said.

Yet clamping down on such a widespread practice is difficult, as universities are hardly likely to report students to the authorities for plagiarism, even if they know that it exists.

There is also scant research on the practice of joki, making it difficult to assess how widespread it actually is, and how universities could tackle something so deeply embedded in Indonesian academia.

Some lecturers on X suggested that another issue is that Indonesian academic institutions only offer the option of a thesis in order to graduate, and that this could be changed to a personal essay or another form of exam that would be more difficult for students to plagiarize.

Certainly, this is something that could be considered as a way of overturning a decades-long practice of cheating.

At a deeper level, however, academic institutions need to provide more guidance and support for students and be vigilant about the practice of joki – rather than practicing a “see-no-evil” approach that does nothing to address the issue.

If Indonesian institutions turn a blind eye to cheating before students even enter the workforce and public life, they will likely continue to learn the wrong lessons from the very academic powers that should be setting an example.

Indonesia Strikes Gold in Paris

Indonesia Strikes Gold in Paris

By sribala subramanian.

Reexamining Gender-Based Violence in the Aceh Conflict

Reexamining Gender-Based Violence in the Aceh Conflict

By firhandika ade santury.

Hit Netflix Series Offers an Authentic Depiction of Life in Indonesia

Hit Netflix Series Offers an Authentic Depiction of Life in Indonesia

By aisyah llewellyn.

Art Imitates Life in Indonesian Horror Film – But is Anyone Entertained?

Art Imitates Life in Indonesian Horror Film – But is Anyone Entertained?

Nuclear Shadows Over South Asia: Strategic Instabilities in the China-India-Pakistan Triad

By Shawn Rostker

Sheikh Hasina’s Exit Renews Concerns of India-Pakistan Conflict in Bangladesh

Sheikh Hasina’s Exit Renews Concerns of India-Pakistan Conflict in Bangladesh

By umair jamal.

Guangzhou Shows Why China Is So Attractive to the Global South

Guangzhou Shows Why China Is So Attractive to the Global South

By gabriele manca.

Fresh Reports Emerge of Rohingya Killings in Western Myanmar

Fresh Reports Emerge of Rohingya Killings in Western Myanmar

By sebastian strangio.

What’s Driving Lithuania’s Challenge to China?

By Aleksander Lust

Hun Manet: In His Father’s Long Shadow

By Markus Karbaum

Afghanistan: A Nation Deprived, a Future Denied

By Coco Ree

Indian Government’s Intensifying Attack on Scientific Temperament Worries Scientists

By Snigdhendu Bhattacharya

Rachael 'Raygun' Gunn says criticism over Paris Olympics performance is 'pretty devastating'

Sport Rachael 'Raygun' Gunn says criticism over Paris Olympics performance is 'pretty devastating'

A woman wearing a green and gold tracksuit holds her hands like a kangaroo on a purple stage with judges behind her

Australian Olympic breakdancer Rachael Gunn has made her first public statement since the Paris Olympic Games.

Gunn was criticised for her performance in breaking, with allegations she was mocking the sport.

What's next?

The Australian Olympic Committee demanded an online petition calling for accountability over Gunn's selection for Paris 2024 be removed.

Australian Olympic breakdancer Rachael Gunn has called for an end to the "pretty devastating" reaction to her performance at the Paris 2024 Olympic Games.

Raygun, the breaking name of Australian academic and B-Girl, was maligned for her performance last week during breaking's Olympic debut.

There have been allegations that the Australian was mocking the sport , while footage of her moves has gained notoriety on social media.

In her first statement since the Olympics, Gunn said she gave her all in the Olympics and was shattered by the backlash she has received.

"I just want to start by thanking all the people who have supported me, I really appreciate the positivity and I was glad I was able to bring some joy into your lives — that's what I hoped," she said in a video posted to her Instagram page.

"I didn't realise that that would also open the door to so much hate which has frankly been pretty devastating.

"While I went out there and had fun, I did take it very seriously. I worked my butt off preparing for the Olympics and I gave my all, truly.

"I'm honoured to have been a part of the Australian Olympic team and to be part of Breaking's Olympic debut."

Raygun was defeated in all three bouts at the Olympic Games, failing to secure a single vote from any of the judges.

On Thursday, the Australian Olympic Committee (AOC) demanded the removal of an online petition that called for "immediate accountability and transparency" over Gunn's selection.

The AOC wrote to change.org, stating the petition had "stirred up public hatred without any factual basis".

The petition has reportedly been removed.

One suggestion made in the petition was that more talented breakdancers were overlooked for the spot on the Olympic team. The petition was also critical of Australian team chef de mission Anna Meares.

"In regards to the allegations and misinformation floating around, I'd like to ask everyone to please refer to the recent statement made by the AOC as well as the posts on the Ausbreaking Instagram page as well as the WDSF Breaking for Gold page," Gunn said.

"I'd really like to ask the press to please stop harassing my family, my friends, the Australian breaking community, and the broader street dance community.

"Everyone has been through a lot as a result of this so I ask you to please respect their privacy."

AOC chief executive Matt Carroll said those who signed the online petition should apologise to Gunn and Meares.

"Take care what you sign up [to] … because it was totally factually incorrect," Carroll told ABC News Breakfast.

"Maybe have a think. There is always opportunity to use social media for good and say sorry to Rachael and Anna."

Carroll said the AOC had been in touch with Gunn to ensure she was coping.

"Some of my crew — because I've been on the plane coming home — have been in contact with her … in the early hours of this morning," he said.

"We are providing support, both personal and in how to manage the PR situation."

The ABC of SPORT

  • X (formerly Twitter)

Related Stories

How did raygun qualify for the olympics is she really the best australia has to offer.

Raygun performs at the Paris Olympics

'Stirring up public hatred without factual basis': AOC condemns petition attacking Olympic competitor 'Raygun'

A woman wearing a green and gold Australian Olympic outfit and cap stands on stage with hands flexed during a routine.

  • Olympic Games

COMMENTS

  1. version control

    1. Git is a very good revision control system, it (and stuff like mercurial, svn) isn't strictly for use with software development. Since you're using Latex, which is textual, git will be useful if you want to keep revisions of your thesis, and then compare revisions or get back an old revision. There are a lot of really cool features in Git ...

  2. cltl/ThesisTips: A collection of tips for writing a PhD thesis

    Regularly check that it's up-to-date. I write my thesis using LaTeX and use Git to commit my changes to Overleaf. Here's a small bash script that I use (the commit message doesn't matter for Overleaf). The date command reminds me when I last committed my work.

  3. How to Git your PhD thesis on GitHub

    Setup a private repository on GitHub. Add a .gitignorefile that tells git to exclude a set of files from version control. Set the .gitignorefile type to TeX. Create a private repository. Clone the remote repository. Now that a remote repository is created on GitHub, we need to clone it on a local computerto work with it.

  4. Is it advisable to put entire source of my thesis up on GitHub?

    Yes, it's a very good idea to use an online repository with a versioning system to write your Masters thesis. It offers a nice automatic backup, you can easily sync from different locations (office, home), and (this is mostly true for papers rather than a thesis) you can easily collaborate with people outside of your university (i.e. who wouldn ...

  5. How To Use Git to Manage Your Writing Project

    Step 2 — Saving Your Initial Draft. Git only knows about files you tell it about. Just because a file exists in the directory holding the repository doesn't mean Git will track its changes. You have to add a file to the repository and then commit the changes. Create a new Markdown file called article.md:

  6. Template for writing a PhD thesis in Markdown

    Quickstart for Mac Users. If you're a Mac user and you have conda and brew installed, run the following in your terminal to install pandoc and TeX packages (steps 1 & 3): # get texlive. brew install --cask mactex. # update tlmgr and packages. sudo tlmgr update --self. # make python venv and install pandoc.

  7. A Quick Introduction to Version Control with Git and GitHub

    Initialized empty Git repository in ~/thesis/.git/ Now you are ready to start versioning your code (Fig 1). Conceptually, Git saves snapshots of the changes you make to your files whenever you instruct it to. ... While Git is best suited for collaboratively writing small text files, nonetheless, collaboratively working on projects in the ...

  8. How I wrote my Thesis in Markdown using Ulysses, Pandoc, LaTeX, Zotero

    Project Structure and Version Management with Git. As a developer, I'm used to using Git in my projects, committing changes, and being able to see a log of everything I did. As I wanted to open-source my thesis anyway, I decided to use Git with GitHub for it as well. My project structure is fairly simple.

  9. I wrote my thesis in Markdown, here's how it went

    Go to the Terminal and get to the directory where you have your .bib and .md file. In Mac terminal you simply use cd command to get there. Let's say I want to get to /Users/user/Desktop ...

  10. Planning to use latex with git to write my thesis. Is this a ...

    configure git to push to multiple servers at once, giving you a great backup solution. use github actions to compile your LaTeX document on the repo so you always have a fresh PDF. sync with overleaf for additional backups. One of my collaborators prefers overleaf for editing, so this worked for us.

  11. Using git to write a thesis

    git tag -a ch2rev1 -m "Chapter 2 Revision 1". git push origin ch2rev1. Note that you can also tag older commits. Since you write your thesis alone you dont even need to branch... just make sure you have a commit now and make your change. If not satisfied revert to commit from before you started your rewrite.

  12. How writers can get work done better with Git

    The best way to learn it is to step through it, so here's how to use Git within the Atom interface from the beginning to the end of a writing project. First thing first: Reveal the Git panel by selecting View > Toggle Git Tab. This causes a new tab to open on the right side of Atom's interface.

  13. Quickstart for writing on GitHub

    Markdown is an easy-to-read, easy-to-write language for formatting plain text. You can use Markdown syntax, along with some additional HTML tags, to format your writing on GitHub, in places like repository READMEs and comments on pull requests and issues. In this guide, you'll learn some advanced formatting features by creating or editing a ...

  14. A Quick Introduction to Version Control with Git and GitHub

    Working with both a local and remote repository as a single user. (A) On your computer, you commit to a Git repository (commit d75es). (B) On GitHub, you create a new repository called thesis.

  15. Git for Writers

    A Kanban board is a method for organizing a project into columns, such as To Do, In Process, and Done, for example. Each column contains individual cards associated with project tasks. Kanban boards can be organized in a wide variety of ways and used to manage personal and work projects. Cards can be moved from column to column as the status of ...

  16. Writing Academic Papers with Org-mode

    Here are the settings I have used for writing academic papers. I write my papers in orgmode, then export them to PDF via LaTeX. This is one of the most fleshed out areas of my dotfiles, in large part because 90% of my Emacs time for the last 6 months has been related to my Master's Thesis in some way.

  17. A Quick Introduction to Version Control with Git and GitHub

    Box 1. Definitions. Version Control System (VCS): (noun) a program that tracks changes to specified files over time and maintains a library of all past versions of those files. Git: (noun) a version control system. repository (repo): (noun) folder containing all tracked files as well as the version control history.

  18. Enough R to write a thesis

    Learn how to write reproducible documents (anything from a course assignment to a thesis or manuscript) in quarto: no more copy-paste nightmares. Quarto is the successor to R markdown (our earlier R markdown book is here). All the biostats books were written in R markdown or quarto; you can see the source code in our repo on GitHub.

  19. thesis · GitHub Topics · GitHub

    Add this topic to your repo. To associate your repository with the thesis topic, visit your repo's landing page and select "manage topics." GitHub is where people build software. More than 100 million people use GitHub to discover, fork, and contribute to over 420 million projects.

  20. citations

    The other option is to write a 'software' paper and find an appropriate journal to publish the library, at which point you can cite that paper from then on. ... git provides hashes corresponding to particular commits/versions: ... Using code from github for own thesis. 4.

  21. thesis · Topics · GitLab

    Thesis build system to write your thesis in markdown and generate a pdf. It can run on GitLab CI or locally using pandoc with plantuml and mermaid support. Markdown thesis masters thhesis. + 6 more. 0 0 0 0. Updated 2 years ago.

  22. GitHub

    This repository contains the skeleton of a Ph.D. thesis written in Org Mode. It does not aim to be an authoritative guide on writing thesis with Org Mode, i.e., it only represents the solution that I found most convenient within the time frame of my Ph.D. The goal of this setup was to allow the seamless inclusion of research chapters into the ...

  23. Would You Pay Someone To Write Your University Thesis?

    In Indonesia, the use of "joki," or writers-for-hire, is a long-standing - and mostly normalized - practice among university students.

  24. Rachael 'Raygun' Gunn says criticism over Paris Olympics performance is

    Australian Olympic breakdancer Rachael Gunn has asked for the "hate" towards her family and friends to stop in the aftermath of a media furore over her performance in Paris.

  25. How Would I Use Obsidian To Write A Thesis

    Use Obsidian to get a bullet point outline of the thesis. with the # headers for the chapters/sections of the thesis. and the bullets for the ideas points. all in one document for the outlining purposes. as the file grows and you flesh out the thesis. as this file grows in length, use the note-refactor plugin to pull out the heading and content ...