Optimizing US Real Estate Investments

Analyzing U.S. real estate listings to uncover market trends, pricing patterns, and investment opportunities through data visualization and statistical modeling.

Data
U.S. real estate listings sourced from Realtor.com via Kaggle

Audience
Real-estate investors and home buyers

Original Dataset:

Total Listings: 2,226,382 real estate listings

Geographic Scope: Included properties from the U.S., Guam, Puerto Rico, the District of Columbia, and New Brunswick

Refinement: Removed all listings outside the 50 U.S. states to focus on relevant real estate markets

Outlier Removal: To ensure data accuracy and relevance

  • Low-Price Listings: 22,070 listings priced below $50,000 were removed, as such low prices are typically unrealistic for solid real estate investments.

  • Extreme Bedroom/Bathroom Counts: Removed listings with more than 20 bedrooms or 20 bathrooms (e.g., one listing had 830 bathrooms, likely a hotel, not a home).

  • Zero Bed/Bath Listings: Excluded properties with 0 bedrooms and 0 bathrooms, as our focus is home sales, not land purchases.

  • Oversized Properties: Removed listings over 25,000 sq. ft, as these are either data errors or beyond the scope of average investors and homebuyers.

Final Dataset:

Total After Cleaning: 1,530,447 real estate listings across all 50 U.S. states

New Feature: Price Range Categorization

Low Range: Less than $300,000

Mid Range: Between $300,000 - $700,000

High Range: Greater than $700,000

This segmentation helps investors at different financial levels identify suitable opportunities.

Techniques used for the project:

  • Data Cleaning & Preprocessing

  • Exploratory Data Analysis

  • Data Visualization with Python and Tableau

  • Time-Series Analysis

  • Statistical Modeling

  • Hypothesis Testing

Exploratory Data Analysis (EDA) and Insights for Real Estate Pricing

To better understand the factors influencing property prices, we conducted an Exploratory Data Analysis (EDA) on the real estate dataset. Our primary goal was to identify relationships between price and key property characteristics, including the number of bedrooms, bathrooms, and house size (measured in square feet). The EDA process involved the following steps:

Univariate and Bivariate Analysis

  • Analyzed distributions of bedrooms, bathrooms, house sizes, and prices.

  • Created scatterplots to visualize relationships between these features and property prices.

Key Scatterplot Insights

Bedrooms vs. Price:

  • A positive relationship was observed; however, the increase in price slowed after four bedrooms, suggesting diminishing returns for larger homes.

Bathrooms vs. Price:

  • The number of bathrooms showed a stronger correlation with price than bedrooms, highlighting the added value of additional bathrooms, especially when increasing from one to two.

House Size vs. Price:

  • House size had the most direct and consistent relationship with price, with larger properties commanding significantly higher values.

Insights for Investors:

  • House size is the most reliable predictor of price, making it a critical factor in evaluating investment opportunities.

  • Additional bathrooms tend to increase property value more significantly than additional bedrooms.

  • Investors should prioritize listings with potential for bathroom additions or properties with above-average square footage for the neighborhood.

Real Estate Market Trends Since 2016

Steady Price Growth: Real estate listing prices have shown a consistent upward trend since 2016.

2020 Decline: A noticeable drop in listings occurred in 2020, likely influenced by the COVID-19 pandemic.

Peak in 2021: Listing prices reached an all-time high in 2021.

2022 Adjustment: The average price experienced a slight decline in 2022.

2023 Rebound: Prices saw a modest increase again in 2023.

Results and Recommendations

Optimal Timing for Investment:

Now is a strategic time to invest in real estate. Listing prices, which spiked in 2021, have since decreased but are expected to rise over time.

Key Property Features Affecting Price:

House Size: The most significant factor influencing property prices.

Number of Bathrooms: A strong correlation with higher property values, particularly when increasing from one to two bathrooms.

Number of Bedrooms: While impactful, the effect plateaus after four bedrooms.

Price Ranges with the Most Listings:

Mid-Range ($300,000 - $700,000): The majority of listings fall within this category.

Low-Range (Less than $300,000): There are still a significant number of properties are available at this price point. Perfect for first time investors or home buyers!

High-Range (Over $700,000): Fewer listings, but substantial investment opportunities.

Top States for Listings by Price Range:

Mid-Range: Florida, California, Texas.

Low-Range: Texas, Florida, Illinois, Pennsylvania.

High-Range: California, Florida, New York.

These insights provide a foundation for data-driven investment decisions, helping investors focus on high-impact features and regions with promising opportunities.

🔍 Next Steps for Analysis

To deepen insights, further analysis can explore:

🏙️ Regional & Urban vs. Rural Trends

Analyze state- and city-level data to compare urban vs. rural pricing.

Identify top cities for investment and assess long-term rural property appreciation.

📈 Market Conditions & Economic Factors

Examine interest rates, inflation, and COVID-19’s impact on housing prices.

🏡 Property Features & Value Drivers

Assess price differences by property type (single-family, condo, multi-unit).

Analyze the impact of square footage, lot size, and amenities on home value.