这门课程所使用的教材是 Artificial Intelligence: A Modern Approach,本份笔记直接沿用书本中对变量及名词的定义,在此没有对它们进行特殊解释。
Lecture 18-19
- Bayes’ rule: P(A|B) = (P(B|A) · P(A)) / P(B).
- Normalization & marginalization: P(X|e) = P(X, e) / P(e) = α · P(X, e) = α · Σy P(X, e, y).
- In the alarm network, calls are conditionally independent of burglaries and earthquakes, but not independent of them.
Lecture 20
- Entropy: -Σi P(vi) · ln(P(vi)):
- The more uniform the probability distribution, the greater its information.
- ID3 algorithm: Top-down construction of decision tree by recursively selecting “best attribute” to use at the current node in tree.
- “best attribute”: Choose the attribute that has the largest expected information gain.
Lecture 20-21
- Back-propagation is synchronous while Hopfield net is asynchronous; Back-propagation tries to minimize the error while Hopfield net tries to minimize the energy.
Lecture 22
- Markov Decision Process (MDP):
- The problem is to find the optimal policy, which maximizes utility at each state;
- Bellman equation: U(s) = R(s) + γ maxa(Σs1(P(s1 | s, a) · U(s1))). Repeating this formula will converge to optimal policy.
Lecture 23
- Reinforcement Learning:
- Utility-based agent: if it already knows the transition model, then use MDP algorithm to solve for Maximum Expectation Utility (MEU) actions.
- Q-learning agent: if it doesn’t know the transition model, then pick action that has highest utility in current state.
- Q(s, a) = Q(s, a) + α(R(s) + γ maxa’ (Q(s’, a’) − Q(s, a))) →
Q(s, a) = (1 - α)Q(s, a) + α(R(s) + γ maxa’ Q(s’, a’)) - Converge to correct values if α decays over time.
- Q(s, a) = Q(s, a) + α(R(s) + γ maxa’ (Q(s’, a’) − Q(s, a))) →
- Reflex agent: learn a policy directly, then pick the action that the policy says.
- Can start right away and then learn as it goes.
Lecture 24-25
Guest speaker.
Lecture 26
🍺