Prediction of Boston Housing Prices Project¶

Problem¶

Housing in Boston can be unpredictable, expensive, and dependent on a variety of different features, but creating a model that predicts housing prices in Boston suburbs can help show which features correlate to price rates in Boston and which feature most significantly impacts housing rates in the Boston retail market.

Citations:¶

Boston Is 2nd Most Expensive US City for Renters, New Report Shows

‘A housing market for almost no one’: Rising prices and interest rates have made home buying feel impossible

'We walked into a buzzsaw': This spring, Greater Boston's housing market is tougher than ever.

Solution¶

This project aims to analyze the Boston Housing Prices dataset provided by StatLib - Carnegie Mellon University to understand the factors that influence housing prices in Boston. The dataset contains information about various features such as crime rate, number of rooms, distance to employment centers, accessibility to highways, and others, which can be used to predict housing prices in Boston.

Impact¶

If successful, this project could be a predictive model that accurately predicts housing prices in Boston and points out the most significant features that influence housing prices and the housing market in Boston. Additionally, this project's findings could be useful for real estate agents and homeowners who are trying to determine the factors that influence housing prices in Boston.

Dataset¶

We will use the Boston Housing Prices Dataset which observes the following features for over 500 suburbs.

1) CRIM: per capita crime rate by town

2) ZN: proportion of residential land zoned for lots over 25,000 sq.ft.

3) INDUS: proportion of non-retail business acres per town

4) CHAS: Charles River dummy variable (1 if tract bounds river; 0 otherwise)

5) NOX: nitric oxides concentration (parts per 10 million)

6) RM: average number of rooms per dwelling

7) AGE: proportion of owner-occupied units built prior to 1940

8) DIS: weighted distances to five Boston employment centres

9) RAD: index of accessibility to radial highways

10) TAX: full-value property-tax rate per 10k

11) PTRATIO: pupil-teacher ratio by town

12) B: proportion of black population by town

13) LSTAT: % lower status of the population

14) MEDV: Median value of owner-occupied homes in $1000's

Methodology¶

First, the dataset will be cleaned for analysis by working with missing values and handling variance within the dataset. The dataset will then be explored using data visualization (graphs, box plots, etc.) to gain insights into the relationships between the different features and determine which feature most greatly correlates with pricing. Lastly, predictive models will be developed using linear regression (line of best fit) and k-Nearest neighbors to determine which features are most significant in predicting housing rates in Boston.

In [1]:

import pandas as pd
spreadsheet = pd.read_csv(r'C:\Users\check\Downloads\DS Jupyter Notebook\boston.csv')
spreadsheet.head()

Out[1]:

	CRIM	ZN	INDUS	NOX	RM	AGE	DIS	RAD	TAX	PTRATIO	B	LSTAT	MEDV
0	0.00632	18.0	2.31	0.538	6.575	65.2	4.0900	1	296.0	15.3	396.90	4.98	24.0
1	0.02731	0.0	7.07	0.469	6.421	78.9	4.9671	2	242.0	17.8	396.90	9.14	21.6
2	0.02729	0.0	7.07	0.469	7.185	61.1	4.9671	2	242.0	17.8	392.83	4.03	34.7
3	0.03237	0.0	2.18	0.458	6.998	45.8	6.0622	3	222.0	18.7	394.63	2.94	33.4
4	0.06905	0.0	2.18	0.458	7.147	54.2	6.0622	3	222.0	18.7	396.90	5.33	36.2

Using select features, such as crime rate, distance to employment centers, pupil-to-teacher ration, and accessibility to highways, this dataset can be used to compute the reliability of a prediction model for various populations within the area based on numerical statistics.