GitHub: An essential file repository in the scientific community

The fast-growing technological change and the need to innovate quickly has yielded collaboration requirements among groups coming from different places of the world.  For instance, in large-scale worldwide scientific research projects, it is important to share all the information and keep it updated. The know-how relies on cloud-based systems, not only to coordinate and manage the information, but also to handle the computational efforts at different strategic peaks during the process of the project.   “Ancient” resources such as offline-working cannot maintain this kind of development.

Google Drive and Dropbox are both currently the bestl-known cloud file storage services. The use of these platforms has heavily increased but, in the scientific field, another file repository has stepped forward: GitHub. This file hosting platform provides several collaboration features such as task management and features requests for every project, among others. It has become the main platform of the open-source development community that many scientists have begun considering it for a conventional file repository.

At first glance, it seems difficult to use – maybe it is the first time you have heard about GitHub, or never had the chance to try it!- but once you start using it, you will not replace it by any other file storage. Following carefully the basic steps, you will master it:


– Steps needed before getting started

1. Download the Git bash freeware application. Using few command lines, this application helps you to download/upload files from your working directory located in your desktop to the corresponding directory in the cloud.

2. If you do not have a github account (for example, username: freshbiostats), create it here. This site will be your working place in the cloud. The https addres of your site will be (for example): https://github.com/freshbiostats.


– Steps needed during the working process


3. Once you have a github profile, it is highly recommended to create a new repository (i.e., new folder) for each project you are working on. Choosing an adequate name (for example: fresh) describing the content of it, your files will be uploaded there so that you can share all the information with other people working on the same work assignment. These repositories are public by default, but you have the choice by making them private.


4. In your computer,  make a directory, for example, named fresh. Using Git bash and via cd command, go to the created folder.  For simplicity, this directory will be called from now as master.  After it, type

git init

This initializes a git repository in your project. It means that you have entered in the github world and you are ready to work with this system.


5. It is time to create files and upload them to the cloud. In case there are no files in the master folder, and you have to create a document to upload to the web-storage system. To do that:

git status          this command lists out the content of the folder


In case the master directory is empty, and you want to create a file “README.doc”, with a sentence “Hello World”:

 touch README.doc “Hello World”   


Now, once the file is created, the next step is to upload to the web. To this end, execute:

git add README.doc                          it stages the document to the directory

     git commit  -m  “Readme document”        it adds a label to the document.

     git remote add origin https://github.com/freshbiostats/fresh.git


The last command line assigns the destination folder where the file is to be  uploaded. It creates a remote connection, named “origin”, pointing at the GitHub repository just created.

6. After that, typing:

git push -u origin master

With this command line, you send all the files added to the master (called as fresh) to the repository located in the web (https://github.com/freshbiostats/fresh.git). AND…THAT’S ALL! The file is already in the cloud!.



Now, it is your turn, it is worth a try: GitHub, commit and push… !!


‘Diss’ and tell!

Prior to our presentation at the JEDE III conference on “assessment of impact metrics and social network analysis”, we’d like to get a flavour of the research dissemination that’s going on out there so we can catalogue some of the resources used by researchers in Biostatistics. As Pilar mentioned in previous posts (read here and here), we’re active users of online tools such as PubMed and Stackoverflow, and although we’re relatively new to aggregators such as ResearchBlogging, we really believe in their value as disseminators in the current Biostatistics arena.

Still, we feel we must be missing many others…So we’re interested both in what you use to advance your research but also how you promote it.

We’d be very grateful for your answers to the following questions (we’ll show a summary of the responses at the end of the month):

We welcome your further comments on this topic below and look forward to reading your answers.  

Hopefully see you in Pamplona!