Understanding the relationship between variables

When analyzing many variables, scatter plots and correlation coefficients can quickly uncover patterns and reduce a large amount of data to a subset of interesting relationships.

Correlation describes the strength of relationship between two variables. A correlation coefficient ranges from -1 to +1. +1 indicates a perfect positive linear relationship, and -1 indicates a perfect negative linear relationship. Zero indicates the variables are uncorrelated and there is no linear relationship. Normally the correlation coefficient lies somewhere between these values.

  1. Open the file New York Neighborhoods.xlsx.
  2. Click a cell in the dataset.
  3. On the Analyse-it ribbon tab, in the Statistical Analyses group, click Multivariate, and then click Correlation Matrix.
    The analysis task pane opens.
  4. In the Model drop-down list, select Multivariate.
  5. In the Y variables list box, select Affordability, Transit, Shopping & Services, Crime, Food, Schools, Diversity, Creative, Housing Quality, Green Space, Wellness, Nightlife.
  6. Select Correlation and Color maps.
  7. On the Analyse-it ribbon tab, in the Correlation group, click Scatter matrix, and then click Scatter Plot with Density Ellipses and Histograms.
  8. In the Density ellipse edit box, enter 75%.
  9. Click Calculate.
    The results are calculated and the analysis report opens.

The scatter plot matrix shows plots for all of the pairs of variables, and each plot shows the relationship between a pair of variables. The red ellipse contains the middle 75% of the neighborhoods and indicates whether the two variables are positively, negatively, or not correlated.

Scatter plot matrix(click to enlarge)

The correlation matrix shows the correlation coefficient for each pair of variables. Positively correlated variables are blue and negatively correlated variables are red, with the intensity dependent on the magnitude of the correlation.

Correlation matrix(click to enlarge)
Based on the scatter plot matrix and the correlation matrix, a few relationships are obvious:
  • Neighborhoods with affordable housing don’t offer good transit.
  • Better shopping also means a greater number of restaurants.
  • There is less creative capital in neighborhoods with high diversity.
  • Wellness seems almost completely unrelated to other factors.