AI 第三部分“机器学习”复习笔记

这门课程所使用的教材是 Artificial Intelligence: A Modern Approach,本份笔记直接沿用书本中对变量及名词的定义,在此没有对它们进行特殊解释。

Lecture 18-19

  1. Bayes’ rule: P(A|B) = (P(B|A) · P(A)) / P(B).
  2. Normalization & marginalization: P(X|e) = P(X, e) / P(e) = α · P(X, e) = α · Σy P(X, e, y).
  3. In the alarm network, calls are conditionally independent of burglaries and earthquakes, but not independent of them.

Lecture 20

  1. Entropy: -Σi P(vi) · ln(P(vi)):
    • The more uniform the probability distribution, the greater its information.
  2. ID3 algorithm: Top-down construction of decision tree by recursively selecting “best attribute” to use at the current node in tree.
    • “best attribute”: Choose the attribute that has the largest expected information gain.

Lecture 20-21

  1. Back-propagation is synchronous while Hopfield net is asynchronous; Back-propagation tries to minimize the error while Hopfield net tries to minimize the energy.

Lecture 22

  1. Markov Decision Process (MDP):
    • The problem is to find the optimal policy, which maximizes utility at each state;
    • Bellman equation: U(s) = R(s) + γ maxas1(P(s1 | s, a) · U(s1))). Repeating this formula will converge to optimal policy.

Lecture 23

  1. Reinforcement Learning:
    • Utility-based agent: if it already knows the transition model, then use MDP algorithm to solve for Maximum Expectation Utility (MEU) actions.
    • Q-learning agent: if it doesn’t know the transition model, then pick action that has highest utility in current state.
      • Q(s, a) = Q(s, a) + α(R(s) + γ maxa’ (Q(s’, a’) − Q(s, a))) →
        Q(s, a) = (1 - α)Q(s, a) + α(R(s) + γ maxa’ Q(s’, a’))
      • Converge to correct values if α decays over time.
    • Reflex agent: learn a policy directly, then pick the action that the policy says.
      • Can start right away and then learn as it goes.

Lecture 24-25

Guest speaker.

Lecture 26

🍺

评论