AI 第三部分“机器学习”复习笔记

AI 第三部分“机器学习”复习笔记

2018 年 11 月 25 日

这门课程所使用的教材是 Artificial Intelligence: A Modern Approach，本份笔记直接沿用书本中对变量及名词的定义，在此没有对它们进行特殊解释。

Lecture 18-19

Bayes’ rule: P(A|B) = (P(B|A) · P(A)) / P(B).
Normalization & marginalization: P(X|e) = P(X, e) / P(e) = α · P(X, e) = α · Σ_y P(X, e, y).
In the alarm network, calls are conditionally independent of burglaries and earthquakes, but not independent of them.

Lecture 20

Entropy: -Σ_i P(v_i) · ln(P(v_i)):
- The more uniform the probability distribution, the greater its information.
ID3 algorithm: Top-down construction of decision tree by recursively selecting “best attribute” to use at the current node in tree.
- “best attribute”: Choose the attribute that has the largest expected information gain.

Lecture 20-21

Back-propagation is synchronous while Hopfield net is asynchronous; Back-propagation tries to minimize the error while Hopfield net tries to minimize the energy.

Lecture 22

Markov Decision Process (MDP):
- The problem is to find the optimal policy, which maximizes utility at each state;
- Bellman equation: U(s) = R(s) + γ max_a(Σ_s₁(P(s₁ | s, a) · U(s₁))). Repeating this formula will converge to optimal policy.

Lecture 23

Reinforcement Learning:
- Utility-based agent: if it already knows the transition model, then use MDP algorithm to solve for Maximum Expectation Utility (MEU) actions.
- Q-learning agent: if it doesn’t know the transition model, then pick action that has highest utility in current state.
  - Q(s, a) = Q(s, a) + α(R(s) + γ max_a’ (Q(s’, a’) − Q(s, a))) →
    Q(s, a) = (1 - α)Q(s, a) + α(R(s) + γ max_a’ Q(s’, a’))
  - Converge to correct values if α decays over time.
- Reflex agent: learn a policy directly, then pick the action that the policy says.
  - Can start right away and then learn as it goes.

Lecture 24-25

Guest speaker.

Lecture 26

🍺

评论

标签

算法 5

计算机系统 4

阅读 3

人工智能 3

软件工程 3

求职 3

网络 2

旅行 2

云计算 2

图像处理 2

博客 1

Python 语言 1

最新文章