Skip to content

Commit fa33d0a

Browse files
authored
Merge pull request #70 from ContextualAI/add-aieng
adding files for ai egineering
2 parents 0296000 + a7de89e commit fa33d0a

File tree

5 files changed

+63
-0
lines changed

5 files changed

+63
-0
lines changed
737 KB
Binary file not shown.
737 KB
Binary file not shown.

08-ai-workshop/data/C_Neptune.pdf

1.18 MB
Binary file not shown.

08-ai-workshop/data/D_Unilever.pdf

1.16 MB
Binary file not shown.
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
prompt,response
2+
What was NVIDIA's Data Center revenue in Q4 FY25?,"NVIDIA's Data Center revenue for Q4 FY25 was $35,580 million.[1]()
3+
4+
This represents a significant increase from the previous quarter (Q3 FY25) when Data Center revenue was $30,771 million.[1]()
5+
6+
The full quarterly trend for Data Center revenue in FY25 was:
7+
- Q4 FY25: $35,580 million
8+
- Q3 FY25: $30,771 million
9+
- Q2 FY25: $26,272 million
10+
- Q1 FY25: $22,563 million[1]()"
11+
What is the correlation coefficient between Neptune's distance from the Sun and US burglary rates?,"Based on the analysis of data from 1985-2022, I can provide the following information about the correlation between Neptune's distance from the Sun and US burglary rates:
12+
13+
The correlation coefficient (r) between Neptune's distance from the Sun and US burglary rates is 0.969, with a coefficient of determination (r²) of 0.940 and a p-value less than 0.01.[1]()
14+
15+
To help understand what this correlation coefficient means:
16+
17+
Correlation coefficients work on the following scale:
18+
- A value of 0.99 indicates a very strong positive relationship (when one variable increases, the other also tends to increase)
19+
- A value of 0.02 indicates a very weak or non-existent connection
20+
- A value of -0.99 indicates a strong negative relationship (when one increases, the other tends to decrease)[3]()
21+
22+
However, it's important to note some crucial context about this correlation:
23+
24+
This analysis has several significant limitations:
25+
- It involves data dredging, where 25,153 variables were compared against each other to find matching patterns[1]()
26+
- There is likely no direct causal connection between these variables[1]()
27+
- The observations are not independent, as sequential years are connected[1]()
28+
- The y-axis scaling has been manipulated to make the relationship appear more dramatic than it actually is[1]()
29+
30+
Given these limitations, while the statistical correlation is mathematically valid, it should be interpreted with extreme caution and not taken as evidence of any causal relationship between Neptune's position and burglary rates."
31+
How did NVIDIA's total revenue change from Q1 FY22 to Q4 FY25?,"I can provide the total revenue comparison between Q1 FY22 and Q4 FY25 based on the available data:
32+
33+
In Q1 FY22, NVIDIA's total revenue was $5,661 million[2]()
34+
35+
By Q4 FY25, the total revenue had increased to $39,331 million[1]()
36+
37+
This represents a significant growth over the three-year period. If you'd like to see the specific quarterly progression, I can provide those details as well."
38+
"What are the four main reasons why spurious correlations work, according to the Tyler Vigen documents?","Based on the provided documentation, here are the four main reasons why spurious correlations can occur:
39+
40+
1. Data Dredging: This involves making massive numbers of correlation calculations - in this case, 632,673,409 calculations across 25,153 variables. Any sufficiently large dataset will yield strong correlations completely at random, even when there's no actual relationship.[1]()
41+
42+
2. Lack of Causal Connection: There is typically no direct connection between the variables being compared. This is particularly problematic when using years as the base variable, since many unrelated things happen within a year.[1]()
43+
44+
3. Observations Not Independent: When analyzing sequential data like yearly observations, the data points aren't independent of each other. Human behavior tends to be continuous, meaning there's no reason to expect sudden changes just because the calendar rolls over to a new year. Standard statistical calculations don't account for this, making the correlations appear more significant than they actually are.[1]()
45+
46+
4. Manipulated Visual Presentation: Graphs can be intentionally misleading, especially when the Y-axis doesn't start at zero. This, combined with the use of line graphs, can make correlations appear more dramatic than they actually are.[1]()
47+
48+
These factors can work together to create seemingly impressive but actually meaningless correlations, as demonstrated in the provided examples."
49+
Why should we be skeptical of the correlation between Unilever's revenue and Google searches for 'lost my wallet'?,"There are several compelling reasons to be skeptical of this correlation between Unilever's revenue and Google searches for 'lost my wallet'. Here are the key concerns:
50+
51+
1. Data Dredging: This correlation emerged from analyzing 25,153 variables against each other, resulting in 632,673,409 correlation calculations. This approach, known as data dredging, can produce strong correlations completely at random in large datasets.[1]()
52+
53+
2. Lack of Causal Connection: There is likely no direct connection between these variables. The use of years as the base variable further complicates this, as many unrelated things happen within a year.[1]()
54+
55+
3. Non-Independent Observations: The sequential yearly observations are not independent of each other. Human behavior doesn't suddenly change on January 1, but standard p-value calculations don't account for this.[1]()
56+
57+
4. Visual Manipulation: The graph's y-axis doesn't start at zero, and a line graph format is used, which makes the visual connection appear more significant than it actually is. This presentation is mathematically true but intentionally misleading.[1]()
58+
59+
While the statistical measures might look impressive at first glance, these methodological concerns significantly undermine the validity of the correlation. The relationship appears to be what's known as a 'spurious correlation' - a statistical relationship that doesn't reflect any actual causal connection between the variables.
60+
61+
The correlation coefficient (r) is 0.921 with a p-value less than 0.01, which might seem significant, but this level of correlation (0.92) would be found in 4.005E-5% of random cases when comparing variables.[5]()
62+
63+
Given these substantial methodological concerns, it would be inappropriate to draw any meaningful conclusions about a relationship between Unilever's revenue and people losing their wallets based on this correlation alone."

0 commit comments

Comments
 (0)