roc star Download -ROC roc star源代碼下載

roc star

其他源碼

下載

ROC-Star：ROC-AUC的目標函數實際上是有效的。

用於二進制分類。每個人都喜歡曲線（AUC）指標下的區域，但沒有人直接以其損失功能為目標。相反，人們使用二進制交叉熵（BCE）的代理函數。

大多數時候，這種工作都很好。但是我們遇到了一個令人不安的問題：我們能獲得更高的分數，而損失功能本質上更接近AUC？

由於公元前與AUC的關係很少，因此似乎很可能。已經有許多嘗試找到更直接針對AUC的損失函數的嘗試。（一種常見的策略是某種形式的排名損失功能，例如鉸鏈排名。）但是，實際上，從未出現過明顯的贏家。公元前沒有嚴重的挑戰。

還有超出表現的考慮。由於BCE與AUC基本不同，因此BCE在最後的訓練中傾向於表現不佳，我們試圖將其引導到最高的AUC分數。

大量AUC優化實際上最終發生在超參數的調整中。早期停止成為不舒服的必要性，因為該模型可能隨時與其高分相比急劇差異。

我們希望損失功能能給我們帶來更高的分數和更少的麻煩。

我們在這裡介紹這樣的功能。

問題：AUC顛簸

我最喜歡的AUC工作定義是：讓我們稱二進制類標籤為“黑色”（0）和“白色”（1）。隨機選擇一個黑色元素，讓X為其預測值。現在選擇一個帶有值y的隨機白色元素。然後，

auc =元素處於正確順序的概率。也就是說， x < y 。

就是這樣。對於諸如訓練集之類的任何給定的點，我們可以通過進行蠻力計算來獲得此概率。掃描所有可能的黑/白對的集合，併計算正確排序的部分。

我們可以看到，AUC分數沒有可區分的（相對於任何單個X或y 。 AUC保持不變。一旦點確實越過鄰居，我們就有機會翻轉X <y比較之一 - 改變了AUC。因此，AUC沒有平穩的過渡。

這是神經網的問題，我們需要一個可區分的損失功能。

搜索：古人和文物。

因此，我們著手找到與AUC盡可能接近的可區分函數。

我挖掘了現有文獻，沒有發現在實踐中起作用。最終，我遇到了一個好奇的代碼，有人檢查了Tflearn代碼庫。

沒有大張旗鼓，它承諾以新的損失函數的形式與BCE的可區分釋放。

（不要嘗試，它會炸毀。）：http：//tflearn.org/objectives/#roc-auc-score

 def roc_auc_score(y_pred, y_true):
"""Bad code, do not use. 
ROC AUC Score.
Approximates the Area Under Curve score, using approximation based on
the Wilcoxon-Mann-Whitney U statistic.
Yan, L., Dodier, R., Mozer, M. C., & Wolniewicz, R. (2003).
Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic.
Measures overall performance for a full range of threshold levels.
Arguments:
    y_pred: `Tensor`. Predicted values.
    y_true: `Tensor` . Targets (labels), a probability distribution.
"""
with tf.name_scope("RocAucScore"):
    pos = tf.boolean_mask(y_pred, tf.cast(y_true, tf.bool))
    neg = tf.boolean_mask(y_pred, ~tf.cast(y_true, tf.bool))
 .
 .
 more bad code)

它根本不起作用。（炸毀實際上是其最少的問題）。但它提到了基於的論文。

即使該論文很古老，可以追溯到2003年，但我發現，有了一些工作 - 數學和仔細編碼的延伸 - 實際上是有效的。它是Uber-fast，速度與BCE相當（對於GPU/MPP而言，也可以矢量化） *。在我的測試中，它給出的AUC得分高於BCE，對學習率的敏感（避免在我的測試中需要調度程序），並且完全消除了早期停止的需求。

好的，讓我們轉到原始論文：通過近似Wilcoxon -Mann -Whitney統計來優化分類器性能。

紙

作者， Yan等。 al ，通過以特定形式編寫AUC分數來激勵討論。回想一下我們的示例，我們通過對可能的黑/白對進行蠻力計算來計算AUC，以找到正確排序的部分。令b為黑色值和w白色值集。所有可能的對由笛卡爾產品B x W給出。為了計算我們寫的右順序對：

這實際上只是說“計算右順序對”的數學符號。如果我們將該總和除以對的總數，| b | * | W |，我們完全獲得了AUC度量。（從歷史上看，這被稱為標準化的Wilcoxon-Mann-Whitney（WMW）統計。）

為了從中產生損失函數，我們可以將x <y比較與x> y進行翻轉，以便對錯誤排序的對進行懲罰。當然，問題是當X越過y時，不連續的跳躍。

Yan等。 Al調查 - 然後拒絕 - 使用連續近似的步驟（Heviside）函數（例如Sigmoid曲線）進行了工作。然後，他們從帽子中拉出它：

Yann通過對WMW應用一系列更改來獲得這個論壇：

X <y已被翻轉為y <x，以使其損失（更高較差）。因此，損失集中在錯誤的訂購對上。
而不是平等對待所有對，而是對兩人的相距多遠。
該重量被提高到p的力量。
我們將填充γ添加到該距離。

我們會依次通過這些。 1很清楚。除了眼睛，還有2到2。直觀的有意義的是，與分離寬的訂購的對訂購的對應該給予更大的損失。但是，隨著分離接近0，也正在發生一些有趣的事情。損失是線性的，而不是階躍功能。因此，我們擺脫了不連續性。

實際上，如果p為1，而γ為0，那麼損失將只是我們的老朋友relu（xy）。但是在這種情況下，我們注意到打ic，揭示了對指數p的需求。 relu在0時並不可區分。這在Relu更習慣於激活函數的角色中並不是什麼問題，但是出於我們的目的，我們最感興趣的是，奇異性直接落在我們最感興趣的東西上：白色和黑色元素的點互相傳遞。

幸運的是，提高依賴能力可以解決此問題。與p> 1的relu^p無處不在。好的，p> 1。

現在回到γ：γ提供了一個“填充”，該填充物在兩個點之間實施。我們不僅懲罰了錯誤的訂購對，還懲罰了太近的右順序對。如果一對右順序太近，則其元素可能會因隨機神經網的隨機跳動而在將來被交換。這個想法是讓他們分開，直到達到舒適的距離。

這就是論文中概述的基本思想。現在，我們對γ和p進行了一些改進。

關於那個γ和p

在這裡，我們用紙有點打破。 Yan等。 Al在選擇γ和P的主題上似乎有些矛盾，僅提供p = 2或p = 3似乎很好，並且γ應該在0.10到0.70之間。 Yan基本上希望我們能用這些參數和弓箭運氣。

首先，我們永久修復p = 2，因為任何自尊損失函數都應是一個平方之和。（原因之一是它確保損失函數不僅可區分，而且是凸）

第二個也是更重要的是，讓我們看一下γ。 “從0.10到0.70”的啟發式鏡頭看起來很奇怪；即使將預測標準化為0 <x <1，該指南似乎過於寬敞，對基礎分佈無動於衷，而且很奇怪。

我們將從訓練集中得出γ。

考慮訓練套件及其黑色/白對， b x w 。有| b || W |這組成對。其中，| b | | W | AUC是正確的。因此，錯誤排序的對的數量為（1-AUC）| b | | W |

當γ為零時，只有這些錯誤排序的對就開始運動（具有正損失。）正γ會擴大移動對的集合，以包括一些對正確排序但太近的對。我們不必擔心γ的數值值，而是要指定要在運動中設置多少個太近的對：

我們定義一個常數δ，該δ固定了過於近距離對的比例與錯誤排序的對。

| TOO_CLOSE_PAIRS | =δ|錯誤的_ordered_pairs |

我們在整個訓練中修復此δ並更新γ以符合它。對於給定的δ，我們發現γ使得

| y+γ> x |的對=δ| y> x |的對

在我們的實驗中，我們發現δ可以在0.5到2.0範圍內，而1.0是一個很好的默認選擇。

因此，我們將δ設置為1，p至2，而完全忘記了γ，

讓我們製作代碼

我們的損失功能（1）看起來很昂貴。它要求我們為每個單獨的預測掃描整個培訓集。

我們通過性能調整繞過這個問題：

假設我們正在計算給定的白色數據點x的損耗函數。要計算（3），我們需要將X與黑色預測的整個訓練集進行比較。我們進行短剪切，並使用黑色數據點的隨機子樣本。如果我們將子樣本的大小設置為1000-我們將獲得一個非常（非常）與真實損失函數的近似值。 [1]

類似的推理適用於黑色數據點的損失函數。我們使用所有白色訓練元素的隨機子樣本。

這樣，白色和黑色子樣品很容易適合GPU內存。通過在給定的批次中重複使用相同的子樣本，我們可以將操作分批平行。我們最終獲得了公元前差不多的損失功能。

這是Pytorch中的批處理功能：

 def roc_star_loss( _y_true, y_pred, gamma, _epoch_true, epoch_pred):
    """
    Nearly direct loss function for AUC.
    See article,
    C. Reiss, "Roc-star : An objective function for ROC-AUC that actually works."
    https://github.com/iridiumblue/articles/blob/master/roc_star.md
        _y_true: `Tensor`. Targets (labels).  Float either 0.0 or 1.0 .
        y_pred: `Tensor` . Predictions.
        gamma  : `Float` Gamma, as derived from last epoch.
        _epoch_true: `Tensor`.  Targets (labels) from last epoch.
        epoch_pred : `Tensor`.  Predicions from last epoch.
    """
    #convert labels to boolean
    y_true = (_y_true>=0.50)
    epoch_true = (_epoch_true>=0.50)

    # if batch is either all true or false return small random stub value.
    if torch.sum(y_true)==0 or torch.sum(y_true) == y_true.shape[0]: return torch.sum(y_pred)*1e-8

    pos = y_pred[y_true]
    neg = y_pred[~y_true]

    epoch_pos = epoch_pred[epoch_true]
    epoch_neg = epoch_pred[~epoch_true]

    # Take random subsamples of the training set, both positive and negative.
    max_pos = 1000 # Max number of positive training samples
    max_neg = 1000 # Max number of positive training samples
    cap_pos = epoch_pos.shape[0]
    cap_neg = epoch_neg.shape[0]
    epoch_pos = epoch_pos[torch.rand_like(epoch_pos) < max_pos/cap_pos]
    epoch_neg = epoch_neg[torch.rand_like(epoch_neg) < max_neg/cap_pos]

    ln_pos = pos.shape[0]
    ln_neg = neg.shape[0]

    # sum positive batch elements agaionst (subsampled) negative elements
    if ln_pos>0 :
        pos_expand = pos.view(-1,1).expand(-1,epoch_neg.shape[0]).reshape(-1)
        neg_expand = epoch_neg.repeat(ln_pos)

        diff2 = neg_expand - pos_expand + gamma
        l2 = diff2[diff2>0]
        m2 = l2 * l2
        len2 = l2.shape[0]
    else:
        m2 = torch.tensor([0], dtype=torch.float).cuda()
        len2 = 0

    # Similarly, compare negative batch elements against (subsampled) positive elements
    if ln_neg>0 :
        pos_expand = epoch_pos.view(-1,1).expand(-1, ln_neg).reshape(-1)
        neg_expand = neg.repeat(epoch_pos.shape[0])

        diff3 = neg_expand - pos_expand + gamma
        l3 = diff3[diff3>0]
        m3 = l3*l3
        len3 = l3.shape[0]
    else:
        m3 = torch.tensor([0], dtype=torch.float).cuda()
        len3=0

    if (torch.sum(m2)+torch.sum(m3))!=0 :
       res2 = torch.sum(m2)/max_pos+torch.sum(m3)/max_neg
       #code.interact(local=dict(globals(), **locals()))
    else:
       res2 = torch.sum(m2)+torch.sum(m3)

    res2 = torch.where(torch.isnan(res2), torch.zeros_like(res2), res2)

    return res2

請注意，有一些額外的參數。我們正在上一個時代的訓練集中。由於整個訓練集從一個時期變為另一個時期的變化不大，因此損失功能可以再次比較每個預測略有過時的訓練集。這簡化了調試，並且似乎受益於性能，因為“背景”時代並未從一批變為下一個。

同樣，γ是一個昂貴的計算。我們再次使用子採樣技巧，但將子樣本的大小增加到約10,000，以確保准確的估計值。為了保持性能剪輯，我們每個時期僅重新計算一次該值一次。這是這樣做的功能：

 def epoch_update_gamma(y_true,y_pred, epoch=-1,delta=2):
    """
    Calculate gamma from last epoch's targets and predictions.
    Gamma is updated at the end of each epoch.
    y_true: `Tensor`. Targets (labels).  Float either 0.0 or 1.0 .
    y_pred: `Tensor` . Predictions.
    """
    DELTA = delta
    SUB_SAMPLE_SIZE = 2000.0
    pos = y_pred[y_true==1]
    neg = y_pred[y_true==0] # yo pytorch, no boolean tensors or operators?  Wassap?
    # subsample the training set for performance
    cap_pos = pos.shape[0]
    cap_neg = neg.shape[0]
    pos = pos[torch.rand_like(pos) < SUB_SAMPLE_SIZE/cap_pos]
    neg = neg[torch.rand_like(neg) < SUB_SAMPLE_SIZE/cap_neg]
    ln_pos = pos.shape[0]
    ln_neg = neg.shape[0]
    pos_expand = pos.view(-1,1).expand(-1,ln_neg).reshape(-1)
    neg_expand = neg.repeat(ln_pos)
    diff = neg_expand - pos_expand
    ln_All = diff.shape[0]
    Lp = diff[diff>0] # because we're taking positive diffs, we got pos and neg flipped.
    ln_Lp = Lp.shape[0]-1
    diff_neg = -1.0 * diff[diff<0]
    diff_neg = diff_neg.sort()[0]
    ln_neg = diff_neg.shape[0]-1
    ln_neg = max([ln_neg, 0])
    left_wing = int(ln_Lp*DELTA)
    left_wing = max([0,left_wing])
    left_wing = min([ln_neg,left_wing])
    default_gamma=torch.tensor(0.2, dtype=torch.float).cuda()
    if diff_neg.shape[0] > 0 :
       gamma = diff_neg[left_wing]
    else:
       gamma = default_gamma # default=torch.tensor(0.2, dtype=torch.float).cuda() #zoink
    L1 = diff[diff>-1.0*gamma]
    ln_L1 = L1.shape[0]
    if epoch > -1 :
        return gamma
    else :
        return default_gamma

這是直升機視圖，顯示瞭如何在時期循環時使用兩個功能，然後在批處理上使用：

 train_ds = CatDogDataset(train_files, transform)
train_dl = DataLoader(train_ds, batch_size=BATCH_SIZE)

#initialize last epoch with random values
last_epoch_y_pred = torch.tensor( 1.0-numpy.random.rand(len(train_ds))/2.0, dtype=torch.float).cuda()
last_epoch_y_t    = torch.tensor([o for o in train_tt],dtype=torch.float).cuda()
epoch_gamma = 0.20
for epoch in range(epoches):
    epoch_y_pred=[]
    epoch_y_t=[]
    for X, y in train_dl:
        preds = model(X)
        # .
        # .
        loss = roc_star_loss(y,preds,epoch_gamma, last_epoch_y_t, last_epoch_y_pred)
        # .
        # .
        epoch_y_pred.extend(preds)
        epoch_y_t.extend(y)
    last_epoch_y_pred = torch.tensor(epoch_y_pred).cuda()
    last_epoch_y_t = torch.tensor(epoch_y_t).cuda()
    epoch_gamma = epoch_update_gamma(last_epoch_y_t, last_epoch_y_pred, epoch)
    #...

可以在這裡找到一個完整的工作示例，示例。

在下面，我們使用BCE在同一模型上繪製了ROC-Star的性能。經驗表明，ROC-Star通常可以使用BCE簡單地將任何型號換成任何模型，從而有機會提高性能。

展開

附加信息