Housing in Boston can be unpredictable, expensive, and dependent on a variety of different features, but creating a model that predicts housing prices in Boston suburbs can help show which features correlate to price rates in Boston and which feature most significantly impacts housing rates in the Boston retail market.
Boston Is 2nd Most Expensive US City for Renters, New Report Shows
'We walked into a buzzsaw': This spring, Greater Boston's housing market is tougher than ever.
This project aims to analyze the Boston Housing Prices dataset provided by StatLib - Carnegie Mellon University to understand the factors that influence housing prices in Boston. The dataset contains information about various features such as crime rate, number of rooms, distance to employment centers, accessibility to highways, and others, which can be used to predict housing prices in Boston.
If successful, this project could be a predictive model that accurately predicts housing prices in Boston and points out the most significant features that influence housing prices and the housing market in Boston. Additionally, this project's findings could be useful for real estate agents and homeowners who are trying to determine the factors that influence housing prices in Boston.
We will use the Boston Housing Prices Dataset which observes the following features for over 500 suburbs.
1) CRIM: per capita crime rate by town
2) ZN: proportion of residential land zoned for lots over 25,000 sq.ft.
3) INDUS: proportion of non-retail business acres per town
4) CHAS: Charles River dummy variable (1 if tract bounds river; 0 otherwise)
5) NOX: nitric oxides concentration (parts per 10 million)
6) RM: average number of rooms per dwelling
7) AGE: proportion of owner-occupied units built prior to 1940
8) DIS: weighted distances to five Boston employment centres
9) RAD: index of accessibility to radial highways
10) TAX: full-value property-tax rate per 10k
11) PTRATIO: pupil-teacher ratio by town
12) B: proportion of black population by town
13) LSTAT: % lower status of the population
14) MEDV: Median value of owner-occupied homes in $1000's
First, the dataset will be cleaned for analysis by working with missing values and handling variance within the dataset. The dataset will then be explored using data visualization (graphs, box plots, etc.) to gain insights into the relationships between the different features and determine which feature most greatly correlates with pricing. Lastly, predictive models will be developed using linear regression (line of best fit) and k-Nearest neighbors to determine which features are most significant in predicting housing rates in Boston.
import pandas as pd
spreadsheet = pd.read_csv(r'C:\Users\check\Downloads\DS Jupyter Notebook\boston.csv')
spreadsheet.head()
CRIM | ZN | INDUS | CHAS | NOX | RM | AGE | DIS | RAD | TAX | PTRATIO | B | LSTAT | MEDV | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.00632 | 18.0 | 2.31 | 0 | 0.538 | 6.575 | 65.2 | 4.0900 | 1 | 296.0 | 15.3 | 396.90 | 4.98 | 24.0 |
1 | 0.02731 | 0.0 | 7.07 | 0 | 0.469 | 6.421 | 78.9 | 4.9671 | 2 | 242.0 | 17.8 | 396.90 | 9.14 | 21.6 |
2 | 0.02729 | 0.0 | 7.07 | 0 | 0.469 | 7.185 | 61.1 | 4.9671 | 2 | 242.0 | 17.8 | 392.83 | 4.03 | 34.7 |
3 | 0.03237 | 0.0 | 2.18 | 0 | 0.458 | 6.998 | 45.8 | 6.0622 | 3 | 222.0 | 18.7 | 394.63 | 2.94 | 33.4 |
4 | 0.06905 | 0.0 | 2.18 | 0 | 0.458 | 7.147 | 54.2 | 6.0622 | 3 | 222.0 | 18.7 | 396.90 | 5.33 | 36.2 |
Using select features, such as crime rate, distance to employment centers, pupil-to-teacher ration, and accessibility to highways, this dataset can be used to compute the reliability of a prediction model for various populations within the area based on numerical statistics.