Data Scientist's Toolbox



Data Science Specialisation : John Hopkin's University 

Part 1  -  Data Scientist's Toolbox

                                                                                                                   Aaditree Jaisswal

LINK FOR COURSE : 



When we talk about data science,there are a few things we imply it revolves around; Statistics, Data Cleaning, Computer Science etc and when we talk about data, it is basically any information helpful in decision making. It can be values or numbers or facts. There is no limitation.

There are mainly six fields of analysis in data science  :

1. Descriptive : summarises the data.

2. Exploratory : examine the data and find relationships.
    Note: Correlation does not imply causation.

3. Inferential : use a small sample of data and make a conclusion for a large population.

4. Predictive : Use current and historical data to make predictions about the future.

5. Causal : See what happens to one variable when we manipulate another variable.

6. Mechanistic : understand the exact changes that take place in one variable that lead to the exact changes in another variable.


In this course, the language that has been used is R. The platform is R studio and along with that, we use version control.

R is mainly used for two things : Statistical analysis and graphing.

A few functions of R include:

1.install.packages("ggplot2") - to install a single package called "ggplot2".

2. install.packages("ggplot2","devtools"....) - To install multiple packages together.

3. bioclite()  - to get Bioconductor packages

4. install_github("author/package")


It's not just enough to install the package. To use it, we need to load it into R studio.
Install package -> load[library() ==> library(package_name)]

To check if all the packages have been installed , we use either installed.packages() or library().

To update a package, we use update.packages().

To check an old package : old.packages().

Consider in a hypothetical situation, the updated package of ggplot2 doesn't do what we want it to do. Now we need to revert back to the old package. For this , we use:   detach("package:ggplot2",unload=True)

To remove a package completely , we use : remove.packages("package_name")

To check a version : version

To go through information about what you did : SessionInfo()

 To know information about a certain package : help( package_name)
Note: No inverted commas here

To understand how to use a particular function, with clear instructions : browseVignettes()




PROJECTS IN R:

Every project in R should have 3 main folders to store and organise all the data.

1.Data
2.Scripts
3.Output




VERSION CONTROL


Version Control records all the changes we make in our project with a timestamp. This helps when we do not understand or remember a particular change we had made in the past and want to go back through it.

Now we know about R and RStudio and projects, there is one very important thing that one must do, and that is upload the code on Github. This helps to store the code and distribute it as well.

In Github, a repository is equivalent to a project.

A few functions of Github:

commit : saves all the edited work.
Push : updates the repository.
Pull : updates the local version of the repository to the current repository.
Staging : act of preparing a file for editing.
Branch : copies of the main repository .

Three things that help while updating a repository :
1. Single issue commits.
2. Informative message commits.
3.Push and pull often, to keep the repository updated.


CONFIGURING GIT WITH R

1. Terminal :

Step 1: git config --global user.name="NAME"

Step 2: git config --global user.email "xxx@gmail.com"

then click exit .

Note : Both of these should match with the information on your Github account.

2. R STUDIO :

Step 1: Open R.

Step 2: Go to Tools.

Step 3 : Go to Options.

Step 4 : Git/SVN.

Step 5 : Create RSA key.

Step 6 : View public key and copy.

Step 7 : Close.

3. GITHUB ACCOUNT :

Step 1 : Personal Settings

Step 2 : SSH and GPG keys.

Step 3 : Paste the public key here.

Step 4 : Create a repository.

4. R STUDIO: 

Step 1 : Select Version control as GIT.

Step 2 : Put the repository URL .

Step 3 : Create Project.

Step 4: Create a new file.

Step 5 : Make some changes. For example  : print("HELLO WORLD")

Step 6 : Commit.

Step 7 : Push.

5.TO PUT PROJECTS UNDER VERSION CONTROL 

Step 1 : git init

Step 2 : git add .

Step 3 : git status // shows the state of the working directory i.e. whether the files have been added or not.

Step 4 : git commit -m "Initial Commit"




R MARKDOWN

This helps in reproducibility.

To install in R studio : install.packages("rmarkdown")

Tips :

To make text bold  : **text**

To make text italic : *text*

 comments :  ''' {  }'''



Comments

Post a Comment

Whats Trending?