a New Surrogate Risk for Learning from Weakly Labeled Data

Please wait, while the deck is loading…

# {*no-status title-slide custom-title} // comment

- Authors: Valentina Zantedeschi, Rémi Emonet, Marc Sebban - Date: - Lille - Magnet Team

## Soft-Margin SVM
for a sample $S$ of $m$ instances $(x_i,y_i)$ $argmin_{\theta,b}\:\: \frac{1}{2} \left\| \theta \right\|_2^2 + c \sum_i \xi_i $ $s.t. \:\:\: y_i \left( \theta^T \mu(x_i) + b \right) \geq 1- \xi_i \:,\:$ $\xi_i \geq 0$
- $\theta$, $b$ the parameters of the linear separator - $\mu$ a mapping function, so that $\mu(x_i)^T\mu(x_j) = K(x_i,x_j)$ ## Supervised Learning

In Brief

- Learning a classifier from a fully labeled set

Issues

- Label assignment is difficult and expensive: - difficult: unique and reliable labels - expensive: great amount of data, need for experts - Datasets are generally noisy
How to handle the confidence in the labels? ## Weak-Label Learning labels are incorrect, missing or not unique

Sub-problems

- Semi-Supervised Learning - Unsupervised Learning - Label Proportions Learning - Multi-Instance Learning - Multi-Expert Learning - Noisy-Tolerant Learning ## Empirical Surrogate $\beta$-Risk
For any margin-based loss function $ F_{\phi} $ $R_{\phi}^{\beta}(X,h) = \frac{b_{\phi}}{m} \sum_{i=1}^{m} \sum_{\sigma \in -1,1} \beta_i^{\sigma} F_{\phi}(\sigma h(x_i))$

$\beta$ degree of confidence / probability of labels $\beta_i^{\text{-}1} \in [0,1]$, $\beta_i^{\text{+}1} \in [0,1]$ $ \beta_i^{\text{-}1} + \beta_i^{\text{+}1} = 1 $

Margin-based loss functions

## Soft-Margin $\beta$-SVM
primal problem $argmin_{\theta,b}\:\: \frac{1}{2} \left\| \theta \right\|_2^2 + c \sum_i \left(\beta_i^{\text{-}1}\xi_i^{\text{-}1}+\beta_i^{\text{+}1}\xi_i^{\text{+}1} \right)$ $s.t. \:\:\: \sigma (\theta^T \mu(x_i) + b) \geq 1- \xi_i^{\sigma} \:,\:$ $\xi_i^{\sigma} \geq 0$ Lagrangian dual problem $\max_{\alpha} \:\: -\frac{1}{2} \sum_{i,j} \sum_{\sigma,{\sigma}'} \alpha_i^{\sigma} \sigma \alpha_j^{{\sigma}'} {\sigma}' K(x_i,x_j) + \sum_i \sum_{\sigma} \alpha_i^\sigma$ $s.t. \:\: 0 \leq \alpha_i^\sigma \leq c \beta_i^\sigma \:,\:$ $\sum_{i=1}^m \sum_{\sigma} \alpha_i^\sigma \sigma = 0$ ## How the margin is affected

## Relation with the Classical Risk
Let's rewrite the classical risk R_{\phi}(X,Y,h) = R_{\phi}^{\beta}(X,h) - \frac{1}{m} \sum_i \beta_i^{-y_i} y_i h(x_i)
- R_{\phi}^{\beta}(X,h) is the $\beta$-risk - \frac{1}{m} \sum_i \beta_i^{-y_i} y_i h(x_i) is a penality term on the missclassified instances ## Iterative Algorithm
1. Learn $h$ $\:\:argmin_h \:\: N(h) + c R_{\phi}^{\beta}(X,h)$ 3. Estimate $y$ $\forall i=1..m,\:\:\:y_i = sign \left(\beta_i^{\text{+}1}-\frac{1}{2} \right)$ 2. Learn $\beta$ $\:\:argmin_{\beta} R_{\phi}^{\beta}(X,h)$
$s.t.\: \sum_{i=1}^{m}\beta_i^{\text{-}y_i} y_i h(x_i) = 0$
$\beta_i^{\text{-}1} + \beta_i^{\text{+}1} = 1$ ## Semi-Supervised Learning
with $m_l$ labeled instances and $m_u$ unlabeled instances
1. Initialization of $\beta$ - $\forall i=1..m_l \:\: \beta_i^{\sigma} = 1 \:\text{if}\: \sigma = y_i, 0 \:\text{otherwise} $ - $\forall i=m_l+1..m_u \:\: \beta_i^{\sigma} = 0.5$ 2. Iterative Algorithm - Learning $\beta$ of unlabeled set ## Results

WellSVM: Li Yu-Feng, Tsang Ivor W, Kwok James T, Zhou Zhi-Hua. Convex and scalable weakly labeled SVMs The Journal of Machine Learning Research, 2013. ## Perspectives: Differential Privacy How to accuratly learn while preserving the user privacy? Learn on bags of instances: - the labels of each single instance are unknown - we have access to the proportions of the classes per bag # Thanks for your attention

/ − automatically replaced by the author − automatically replaced by the title

← →