Doing linear regression in python begins with looking at the correlation between two objects, such as alcohol and tobacco purchases in different regions. Copy the data, and paste it between pairs of triple quotes in the IPython Notebook. Wrap the data in a pandas Data frame, which gives you access to the whole variables.
- Find the correlation between two variables
Look at a small data regarding the correlation between two products, such as the purchase of tobacco and alcohol in different regions within a place. Use this data to describe the tools used for calculating linear regression. Alternatively, you can use tools such as scikits-learn and stats models to calculate the linear regression and other tools such as pandas for managing data.
- Import the modules
Copy the data, and paste it between a pair of three quotes in a notebook called IPython. End each line in a new line, and delimit each datum by a tab. Split the strings over the new lines before splitting each new datum using the tabs. Ensure all numbers register as numbers, and leave the strings for the regions only.
- Wrap the data in a pandas data frame
Wrapping the data in the data frame allows you to access the whole variables using keywords. It has an excellent support for missing values and dates and is useful in plotting. Give two arguments for the data frame, and the column labels to allow you to refer to the two variables correctly using the labels.