Welcome to My Project Proposal!¶

Introduction¶

I am very eager to continue to explore the applications of Data Science and I hope you enjoy some of the possibilities I have put forth.

A Real World Problem¶

One of the most pressing issues of the current day is the housing crisis sweeping the nation. There are a number of factors that have led to this presently including inflation and high interest rates, however, is there a possibility we can look at what really drives home prices? This article shows us how a lot of the things that can drive a home's value are quantifiable. It is these factors which we will use to determine and possibly predict home values.

In [1]:
from sklearn.datasets import fetch_california_housing
import pandas as pd

# Load the dataset
california = fetch_california_housing()

# Create a Pandas DataFrame from the data and target variables
df = pd.DataFrame(california.data, columns=california.feature_names)
df['MedHouseVal'] = california.target

# data dictionary representing descriptions
data_dict = {
    'MedInc': 'median income for households within a block of houses (measured in tens of thousands of US Dollars)',
    'HouseAge': 'median age of a house within a block; a lower number is a newer building',
    'AveRooms': 'average number of rooms per dwelling',
    'AveBedrms': 'average number of bedrooms per dwelling',
    'Population': 'total number of people residing within a block',
    'AveOccup': 'average household occupancy',
    'Latitude': ' measure of how far north/south a house is',
    'Longitude': 'measure of how far west/east a house is',
    'MedHouseVal': 'median house value for households within a block (measured in hundreds of thousands of US Dollars)'
}

# rename to better understand
# df.rename(columns=data_dict, inplace=True)

# Print the first few rows of the DataFrame
print(df.head())
   MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  Latitude  \
0  8.3252      41.0  6.984127   1.023810       322.0  2.555556     37.88   
1  8.3014      21.0  6.238137   0.971880      2401.0  2.109842     37.86   
2  7.2574      52.0  8.288136   1.073446       496.0  2.802260     37.85   
3  5.6431      52.0  5.817352   1.073059       558.0  2.547945     37.85   
4  3.8462      52.0  6.281853   1.081081       565.0  2.181467     37.85   

   Longitude  MedHouseVal  
0    -122.23        4.526  
1    -122.22        3.585  
2    -122.24        3.521  
3    -122.25        3.413  
4    -122.25        3.422  
In [2]:
import matplotlib.pyplot as plt

# Plot Median House Value vs. Median Income
plt.scatter(df['MedHouseVal'], df['MedInc'], alpha=0.1)
plt.title('Median House Value vs. Median Income')
plt.xlabel('Median Income')
plt.ylabel('Median House Value')
plt.show()

What will we use this dataset for?¶

With this data we will be attempting to build a model which will help us predict home values. We will be using the attributes from the dataset to predict the prices. By making these prediction accurate, it will be helpful to buyers and those interested in real estate.