Skip to content

Commit 615bb16

Browse files
committed
source commit: 57a6826
0 parents  commit 615bb16

37 files changed

+2045
-0
lines changed

01-welcome.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
---
2+
title: Welcome
3+
teaching: 5
4+
exercises: 0
5+
---
6+
7+
::::::::::::::::::::::::::::::::::::::: objectives
8+
9+
- Recognise what the intended target audience is for this lesson
10+
- Recognise previous skills required to undertake this lesson
11+
12+
::::::::::::::::::::::::::::::::::::::::::::::::::
13+
14+
:::::::::::::::::::::::::::::::::::::::: questions
15+
16+
- Who is this lesson for?
17+
- What will we be covering in this lesson?
18+
- What will we not be covering in this lesson?
19+
20+
::::::::::::::::::::::::::::::::::::::::::::::::::
21+
22+
23+
24+
## Who is this lesson for?
25+
26+
There is a growing interest in the application of Artificial intelligence (AI) and Machine Learning in the GLAM (Galleries, Libraries, Archives, and Museums) sector. These technologies offer many potential benefits to the cultural sector but also raises challenges and difficult questions.
27+
28+
Our aim with this lesson is to empower GLAM staff with the foundation to support, participate in and begin to undertake in their own right, machine learning based research and projects with heritage collections.
29+
30+
Library Carpentry lessons are for people working in library- and information-related roles. See [Our Audience](https://librarycarpentry.org/audience/) and [Our Learner Profiles (Draft)](https://github.com/LibraryCarpentry/lc-overview/blob/gh-pages/files/learner-profiles.md) for more information.
31+
32+
## What will we be covering in this lesson?
33+
34+
This lesson will provide a high-level conceptual introduction to AI and machine learning with particular emphasis on applications in, and implications for, the GLAM context.
35+
36+
After following this lesson, learners will be able to:
37+
38+
- Explain and differentiate key terms, phrases, and concepts associated with AI and Machine Learning in GLAM
39+
- Describe ways in which AI is being innovatively used in the cultural heritage context today
40+
- Identify what kinds of tasks machine learning models excel at in GLAM applications
41+
- Reflect on ethical implications of applying machine learning to cultural heritage collections and discuss potential mitigation strategies
42+
- Summarise the practical, technical steps involved in undertaking machine learning projects
43+
- Identify additional resources on AI and Machine Learning in GLAM
44+
45+
## What will we not be covering in this lesson?
46+
47+
- This lesson will not require coding, statistics or maths
48+
49+
:::::::::::::::::::::::::::::::::::::::: keypoints
50+
51+
- Intro to AI for GLAM is for staff working in the GLAM (Galleries, Libraries, Archives, and Museums) sector.
52+
- The lesson is a high-level conceptual introduction to AI and machine learning that will empower GLAM staff to apply those technologies within their own institutions and collections.
53+
- This lesson will not cover coding, statistics or maths.
54+
55+
::::::::::::::::::::::::::::::::::::::::::::::::::
56+
57+

02-AI-in-a-Nutshell.md

Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
---
2+
title: Artificial Intelligence (AI) and Machine Learning (ML) in a nutshell
3+
teaching: 0
4+
exercises: 0
5+
---
6+
7+
::::::::::::::::::::::::::::::::::::::: objectives
8+
9+
- Understand Machine Learning as a subfield of AI and name two others
10+
- Name four types and machine learning and describe the difference between supervised and unsupervised learning
11+
- Describe the difference between a model of data, and a model trained on data
12+
13+
::::::::::::::::::::::::::::::::::::::::::::::::::
14+
15+
:::::::::::::::::::::::::::::::::::::::: questions
16+
17+
- What is a brief history of the field of (AI) and Machine Learning (ML)?
18+
- What do we mean by Artificial Intelligence and Machine Learning? How are they defined?
19+
20+
::::::::::::::::::::::::::::::::::::::::::::::::::
21+
22+
## Artificial Intelligence and Machine Learning
23+
24+
The field of Artificial Intelligence has been around since the 1950s ([Dartmouth Conference, 1956](https://jmc.stanford.edu/articles/dartmouth/dartmouth.pdf)). It is a broad topic which encompasses a number of sub-fields including but not limited to: Logic, Probability, Knowledge Representation, and Machine Learning.
25+
26+
This figure shows how a system based on Artificial Intelligence might function:
27+
28+
![](fig/ep-02-ai-graph.png){alt='AI process'}
29+
30+
The system accepts an input, performs some inference about what the input represents, performs some reasoning about the inferences made based on prior knowledge, and finally decides on an action.
31+
32+
### From logic to learning
33+
34+
The history of Logic stretches all the way back to The Organon of Aristotle ([Organon](https://en.wikipedia.org/wiki/Organon)) and was formalised as a mathematical discipline by George Boole in the 19th century (hence the name Boolean Logic). With logic we can write rules to reason about data or make decisions.
35+
36+
- "**It is** raining therefore I will carry an umbrella".
37+
38+
Logical rules are based on things being True or False but the world is not so clear cut. Probability lets us add doubt and uncertainty:
39+
40+
- "It **might** rain today, should I take an umbrella?".
41+
42+
With logical rules and probability we can solve quite complex tasks, but there are limits:
43+
44+
- "Which paintings in our collection have umbrellas in them?".
45+
46+
Take a minute to think of how you would describe an umbrella using a set of rules:
47+
48+
- What colour is it?
49+
- What shape is it?
50+
- How big is it?
51+
- What difference does it make if it is up or down?
52+
- Is a parasol an umbrella?
53+
54+
Describing something as intuitively simple as an umbrella is difficult because although we have a rough conceptual idea there isn't a fixed physical description. Even if you break it into component parts you still need to define them - what is a handle? what is a canopy?
55+
It becomes even more complicated if you try to specify the description in a way that a computer can interpret.
56+
Thankfully Machine Learning can come to our rescue.
57+
58+
## What is Machine learning?
59+
60+
Machine Learning is a set of technologies and methods for finding rules when they are too complex to define. They are systems which find rules, learn, and make predictions from data without being explicitely programmed to do so.
61+
62+
::::::::::::::::::::::::::::::::::::::: challenge
63+
64+
## Activity
65+
66+
Which of the following do you think would use Machine Learning?
67+
68+
- a) Counting the number of people in a museum using information from entry and exit barriers.
69+
- b) A search system that looks for images similar to a user submitted sketch.
70+
- c) A system that recommends library books based on what other users have ordered.
71+
- d) A queueing system that spreads people evenly between 5 ticket booths
72+
- e) A program which extracts names from documents by finding all capitalised words and checking them against a list of known names
73+
- f) A system which turns digitised handwritten documents into searchable text
74+
- g) A robot which cleans the vases in a museum without bumping into them or breaking them
75+
76+
::::::::::::::: solution
77+
78+
## Solution
79+
80+
- b, c, f, and g are all examples where machine learning would be needed. The others could all be achieved through simple and easily defined rules.
81+
82+
:::::::::::::::::::::::::
83+
84+
::::::::::::::::::::::::::::::::::::::::::::::::::
85+
86+
There are four types of machine learning, and in this lesson we will focus on the first two:
87+
88+
- Supervised - the system is given data that is categorised and labeled and asked to learn by example to make predictions on totally new data it has never seen before
89+
- Unsupervised - given data that has not been categorised and labeled and asked to put it into groups (find patterns) without guidance
90+
- Semi-supervised - a combination of supervised and unsupervised
91+
- Reinforcement - learns about the world by interacting with its environment (example: self-driving cars and [AlphaGo](https://deepmind.com/research/case-studies/alphago-the-story-so-far))
92+
93+
The primary task of Machine Learning is prediction.
94+
95+
A prediction may be a numerical value: how much will temperature control in the archive cost if we have a hot summer? how many days will library borrowers keep books for?
96+
Or it may be a classification or label: which paintings are of animals/architecture/people? which documents should be classified as sensitive?
97+
98+
## Note
99+
100+
::::::::::::::::::::::::::::::::::::::::: callout
101+
102+
Predicting a numerical value is known as Regression. The term Regression was coined by Francis Galton in the 19th century to describe a biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors tend to regress down towards a normal average (a phenomenon also known as regression toward the mean). ([Regression Analysis](https://en.wikipedia.org/wiki/Regression_analysis#:~:text=The%20term%20%22regression%22%20was%20coined,as%20regression%20toward%20the%20mean)).
103+
104+
105+
::::::::::::::::::::::::::::::::::::::::::::::::::
106+
107+
Imagine you want to go on holiday next month. Imagine! You would like to know what the temperature will be on a small island that has no weather information. To do this you find the following information about other countries around the world: latitude, longitude, month, average temperature. Now you can use a machine learning technique called Regression to predict the temperature at your potential destination using its lat/lon and the month. This is a prediction task.
108+
109+
An alternative approach would be to make a list of destinations you like, and those you don't like. Gather the same information of lat/lon, month and temperature for a sample of countries, but this time add a 'Yes' or 'No' for whether you like them or not. You can now choose one of many machine learning classification algorithms to label the rest of the world's countries Yes or No. This is a binary (there are two categories) classification task.
110+
111+
## Supervised vs unsupervised learning
112+
113+
The previous example was of supervised learning. In the supervised scenario a set of labelled examples are passed to a machine learning classifier which learns to identify relationships between features of the data and the labels, or a numerical output. The table shows some examples of features and labels for some supervised learning tasks:
114+
115+
| Task | Application | Features | Output | Example learned relationship |
116+
| -------------- | --------------------- | ------------------- | ----------------------------------- | --------------------------------------------------- |
117+
| Classification | Sentiment analysis | Words in a sentence | 'Positive' or 'Negative' | 'glad' or 'happy' weighted towards 'Positive' label |
118+
| Classification | Seasonal paintings | Images of paintings | 'Spring','Summer','Autumn','Winter' | Red leaves more predictive of Autumn label |
119+
| Prediction | Child height estimate | Age of child | number in centimetres | Height increases as age increases |
120+
121+
Unsupervised learning is not given any examples. Instead a target is suggested and the algorithm groups the data based on that target. The target is usually the number of groups wanted, and the algorithm will place data points into each group in order to maximum the similarity of group members. The following activity aims to give you an intuition for clustering, a commonly used form of unsupervised learning.
122+
123+
::::::::::::::::::::::::::::::::::::::: challenge
124+
125+
## Activity
126+
127+
Imagine there are 6 people in a workshop and you need to split them evenly between 2 tables based on the similarity of their interests. Their interests are listed below in order of preference. How would you divide them into 2 tables with 3 people on each table?
128+
129+
- Person A - politics, sport, nature
130+
- Person B - walking, cooking, quiz shows
131+
- Person C - baking, sewing, athletics
132+
- Person D - newspapers, biographies, history
133+
- Person E - football, rugby, cricket
134+
- Person F - fine dining, pub quizzes, bird watching
135+
136+
::::::::::::::: solution
137+
138+
## Solution
139+
140+
There isn't a right answer to this challenge. In fact an algorithm with no other information other than the words above would probably distribute them randomly. To perform the task it would need further information that could provide semantic relationships between the words. That may it could establish that sport is similar to football, rugby and cricket, and fine dining and cooking are related. Without that information they are just meaningless strings to a computer. You should have seen that there are multiple solutions and when using unsupervised methods you have little influence over which is chosen.
141+
142+
:::::::::::::::::::::::::
143+
144+
::::::::::::::::::::::::::::::::::::::::::::::::::
145+
146+
Each paradigm has its advantages and disadvantages. Unsupervised learning is a straightforward way of identifying clusters of similar records in a set of data making it ideal for gaining a high level view of a new dataset. However, choosing the right number of clusters can be difficult and there is no way to control the criteria for how clusters are formed. In the above exercise we saw a number of types of activity (physical, food related, current affairs, natural world), with some possibly fitting into two categories.
147+
In supervised learning we define the categories in advance giving us control over the outputs. The downside is the cost of labelling our data. Consider the effort involved in the following tasks:
148+
149+
- Transcribing 100 pages of handwritten medieval documents
150+
- Tagging each of 10000 images if they contain an umbrella
151+
- Linking together daily visitor data with weather data currently held on someone else's website
152+
153+
::::::::::::::::::::::::::::::::::::::: challenge
154+
155+
## Activity
156+
157+
Fill in the blanks with either "Supervised Learning", "Unsupervised Learning", "Prediction" or "Classification"
158+
159+
- Estimating how much money a customer will spend in the museum shop is a \_\_\_\_\_ task
160+
- A program to decide if a customer is a 'big spender' or a 'browser' would use a \_\_\_\_\_ algorithm
161+
- Identifying four types of library visitor is an example of \_\_\_\_\_
162+
- \_\_\_\_\_ requires labelled examples
163+
164+
::::::::::::::: solution
165+
166+
## Solution
167+
168+
- Estimating how much money a customer will spend in the museum shop is a Prediction task
169+
- A program to decide if a customer is a 'big spender' or a 'browser' would use a Classification algorithm
170+
- Identifying four types of library visitor is an example of Unsupervised Learning
171+
- Supervised Learning requires labelled examples
172+
173+
:::::::::::::::::::::::::
174+
175+
::::::::::::::::::::::::::::::::::::::::::::::::::
176+
177+
:::::::::::::::::::::::::::::::::::::::: keypoints
178+
179+
- Machine Learning is a subfield of AI which identifies patterns in data
180+
- Supervised learning algorithms learn by example
181+
- Unsupervised learning algorithms put data into groups of similar objects or records
182+
183+
::::::::::::::::::::::::::::::::::::::::::::::::::
184+
185+

0 commit comments

Comments
 (0)