Correlation With Rapid Miner

Correlation With Rapid Miner

Sarah is a regional sales manager for a nationwide supplier of fossil fuels for home heating. Recent volatility in market prices for heating oil specifically, coupled with wide variability in the size of each order for home heating oil, has Sarah concerned.

She needs to understand what factors are related to heating oil usage, and how she might use the knowledge of such factors to better manage her inventory and anticipated demands.

Correlation is a statistical measure of how strong the relationships are between attributes in a dataset.

The dataset consists of;

Insulation

Temperature

Heating Oil

Number Of Occupants

Average Age

Home Size

Our Dataset image 1.png

Applying Correlation Matrix image 2.png

Correlation Results image 3.png

All correlation coefficients between 0 and 1 represents Positive Correlations

While all coefficients between 0 and -1 are Negative Correlations

There is a positive correlation with Heating oil and Insulation Rating which means as the demand of heating oil rises so will insulation rating and vice versa.

There is a Negative Correlation between Temperature and Insulation rating.

Which means has temperature rises, insulation rating falls and vice versa.

Conclusion

The 2 most strongly correlated attributes in our data are; Heating oil and Average Age

Here are few actions Sarah can take in order to make her decisions

  1. Dropping the Number Of occupants attribute
  2. Investigate the role of home insulation
  3. Adding greater granularity in the dataset
  4. Adding additional attribute to the dataset