hopscotch map下載 - hopscotch map原始碼下載

使用跳房子哈希法的快速哈希映射和哈希集的 C++ 實現

hopscotch-map 庫是快速哈希映射和哈希集的 C++ 實現，使用開放尋址和跳房子哈希來解決衝突。它是一種快取友好的資料結構，在大多數情況下提供比std::unordered_map更好的性能，並且與google::dense_hash_map非常相似，同時使用更少的記憶體並提供更多功能。

本函式庫提供了以下主要類別： tsl::hopscotch_map 、 tsl::hopscotch_set 、 tsl::hopscotch_pg_map和tsl::hopscotch_pg_set 。前兩個速度更快，並且使用 2 的冪增長策略，後兩者則使用素數增長策略，並且能夠更好地應對較差的雜湊函數。如果雜湊的低位元有可能重複模式（例如，您使用身分雜湊函數儲存指標），請使用主要版本。有關詳細信息，請參閱增長政策。

除了這些類別之外，該程式庫還提供tsl::bhopscotch_map 、 tsl::bhopscotch_set 、 tsl::bhopscotch_pg_map和tsl::bhopscotch_pg_set 。這些類別對鍵有一個額外的要求，它必須是LessThanComparable ，但它們提供了更好的漸近上限，請參閱範例中的詳細資訊。儘管如此，如果您沒有特定要求（哈希 DoS 攻擊的風險）， tsl::hopscotch_map和tsl::hopscotch_set在大多數情況下應該足夠了，並且應該是您的預設選擇，因為它們通常表現得更好。

跳房子哈希的概述和一些實現細節可以在這裡找到。

可以在此處找到tsl::hopscotch_map與其他雜湊映射的基準測試。此頁面還提供了一些關於您應該為您的用例嘗試哪種哈希表結構的建議（如果您對tsl命名空間中的多個哈希表實現有點迷失，那麼這很有用）。

主要特點

僅標頭庫，只需將包含目錄新增至包含路徑即可開始使用。如果您使用 CMake，也可以使用 CMakeLists.txt 中匯出的tsl::hopscotch_map目標。
快速哈希表，請參閱一些數字的基準測試。
支援僅移動和非預設可構造鍵/值。
支援異質查找，允許使用與Key不同的類型的find （例如，如果您有一個使用std::unique_ptr<foo>作為鍵的映射，則可以使用foo*或std::uintptr_t作為鍵參數find而不構造std::unique_ptr<foo> ，請參閱範例）。
無需從鍵中保留任何哨兵值。
如果雜湊或鍵相等函數的計算成本很高，則可以在插入時儲存雜湊值，以便更快地重新雜湊和尋找（請參閱 StoreHash 模板參數）。
如果在查找之前知道雜湊值，則可以將其作為參數傳遞以加快查找速度（請參閱 API 中的precalculated_hash參數）。
tsl::bhopscotch_map和tsl::bhopscotch_set在尋找和刪除方面提供 O(log n) 的最壞情況，使這些類別能夠抵抗雜湊表拒絕服務 (DoS) 攻擊（請參閱範例中的詳細資訊）。
此函式庫可以在停用異常的情況下使用（透過 Clang 和 GCC 上的-fno-exceptions選項，在 MSVC 上沒有/EH選項或簡單地透過定義TSL_NO_EXCEPTIONS ）。 std::terminate用於在停用異常時取代throw指令。
API 與std::unordered_map和std::unordered_set非常相似。

與`std::unordered_map`相比的差異

tsl::hopscotch_map嘗試擁有與std::unordered_map類似的接口，但存在一些差異。

插入時迭代器失效的行為方式不同。一般來說，任何修改哈希表的操作（除了erase ）都會使所有迭代器失效（詳細資訊請參閱API）。
對映射中鍵或值的引用和指標的失效方式與插入時對這些鍵值的迭代器相同。
對於迭代器， operator*()和operator->()傳回一個引用和一個指向const std::pair<Key, T>而不是std::pair<const Key, T>的指針，使得值T不可修改。要修改值，您必須呼叫迭代器的value()方法來取得可變參考。例子：

tsl::hopscotch_map< int , int > map = {{ 1 , 1 }, { 2 , 1 }, { 3 , 1 }};
for ( auto it = map.begin(); it != map.end(); ++it) {
    // it->second = 2; // Illegal
    it. value () = 2 ; // Ok
}

僅移動類型必須具有無拋出移動建構函數（使用開放定址，如果移動建構函數可能拋出，則不可能在重新雜湊時保持強異常保證）。
不支援某些與儲存桶相關的方法（例如bucket_size ， bucket ，...）。

這些差異也適用於std::unordered_set和tsl::hopscotch_set之間。

線程安全和異常保證與std::unordered_map/set相同（即可以有多個讀取器而沒有寫入器）。

成長政策

該程式庫透過GrowthPolicy模板參數支援多種增長策略。圖書館提供了三項政策，但如果需要，您可以輕鬆實施自己的政策。

tsl::hh::兩個成長政策的力量。 tsl::(b)hopscotch_map/set所使用的預設策略。此策略將哈希表的儲存桶數組的大小保持為 2 的冪。此約束允許策略避免使用慢模運算將雜湊對應到儲存桶，而不是hash % 2 ⁿ ，而是使用hash & (2 ⁿ - 1) （請參閱快速模運算）。速度很快，但這可能會導致與較差的雜湊函數發生大量衝突，因為 2 的冪的模最終僅掩蓋了最高有效位。
tsl::hh::prime_growth_policy。 tsl::(b)hopscotch_pg_map/set使用的預設策略。此策略將哈希表的桶數組的大小保持為素數。將雜湊映射到儲存桶時，使用素數作為模將導致雜湊值在儲存桶中更好的分佈，即使雜湊函數很差。為了允許編譯器最佳化模運算，該策略使用具有常數素數模的查找表（有關詳細信息，請參閱 API）。比tsl::hh::power_of_two_growth_policy慢，但更安全。
tsl::hh::mod_growth_policy。此策略透過參數中傳遞的可自訂成長因子來成長地圖。然後它只使用模運算子將哈希映射到儲存桶。速度較慢但更靈活。

如果您遇到效能不佳的情況，請檢查overflow_size() ，如果它不為零，則可能會出現很多哈希衝突。要么更改雜湊函數以獲得更統一的值，要么嘗試其他增長策略（主要是tsl::hh::prime_growth_policy ）。不幸的是，有時很難防範衝突（例如，對雜湊映射的 DoS 攻擊）。如果需要，也請檢查tsl::bhopscotch_map/set ，它在查找時提供 O(log n) 而不是 O(n) 的最壞情況，請參閱範例中的詳細資訊。

要實現您自己的策略，您必須實作以下介面。

 struct custom_policy {
    // Called on hash table construction and rehash, min_bucket_count_in_out is the minimum buckets
    // that the hash table needs. The policy can change it to a higher number of buckets if needed 
    // and the hash table will use this value as bucket count. If 0 bucket is asked, then the value
    // must stay at 0.
    explicit custom_policy (std:: size_t & min_bucket_count_in_out);
    
    // Return the bucket [0, bucket_count()) to which the hash belongs. 
    // If bucket_count() is 0, it must always return 0.
    std:: size_t bucket_for_hash (std:: size_t hash) const noexcept ;
    
    // Return the number of buckets that should be used on next growth
    std:: size_t next_bucket_count () const ;
    
    // Maximum number of buckets supported by the policy
    std:: size_t max_bucket_count () const ;
    
    // Reset the growth policy as if the policy was created with a bucket count of 0.
    // After a clear, the policy must always return 0 when bucket_for_hash() is called.
    void clear () noexcept ;
}

安裝

要使用 hopscotch-map，只需將包含目錄新增至包含路徑中即可。它是一個僅包含頭檔的庫。

如果您使用 CMake，也可以將 CMakeLists.txt 中的tsl::hopscotch_map匯出目標與target_link_libraries結合。

 # Example where the hopscotch-map project is stored in a third-party directory
add_subdirectory (third-party/hopscotch-map)
target_link_libraries (your_target PRIVATE tsl::hopscotch_map)

如果專案已經透過make install安裝，您也可以使用find_package(tsl-hopscotch-map REQUIRED)而不是add_subdirectory 。

該程式碼應適用於任何符合 C++17 標準的編譯器。

要執行測試，您將需要 Boost Test 程式庫和 CMake。

git clone https://github.com/Tessil/hopscotch-map.git
cd hopscotch-map/tests
mkdir build
cd build
cmake ..
cmake --build .
./tsl_hopscotch_map_tests

用法

該 API 可以在這裡找到。

所有方法尚未記錄，但它們複製std::unordered_map和std::unordered_set中的行為，除非另有指定。

例子

# include < cstdint >
# include < iostream >
# include < string >
# include < tsl/hopscotch_map.h >
# include < tsl/hopscotch_set.h >

int main () {
    tsl::hopscotch_map<std::string, int > map = {{ " a " , 1 }, { " b " , 2 }};
    map[ " c " ] = 3 ;
    map[ " d " ] = 4 ;
    
    map. insert ({ " e " , 5 });
    map. erase ( " b " );
    
    for ( auto it = map. begin (); it != map. end (); ++it) {
        // it->second += 2; // Not valid.
        it. value () += 2 ;
    }
    
    // {d, 6} {a, 3} {e, 7} {c, 5}
    for ( const auto & key_value : map) {
        std::cout << " { " << key_value. first << " , " << key_value. second << " } " << std::endl;
    }
    
        
    if (map. find ( " a " ) != map. end ()) {
        std::cout << " Found " a " . " << std::endl;
    }
    
    const std:: size_t precalculated_hash = std::hash<std::string>()( " a " );
    // If we already know the hash beforehand, we can pass it in parameter to speed-up lookups.
    if (map. find ( " a " , precalculated_hash) != map. end ()) {
        std::cout << " Found " a " with hash " << precalculated_hash << " . " << std::endl;
    }
    
    
    /*
     * Calculating the hash and comparing two std::string may be slow. 
     * We can store the hash of each std::string in the hash map to make 
     * the inserts and lookups faster by setting StoreHash to true.
     */ 
    tsl::hopscotch_map<std::string, int , std::hash<std::string>, 
                       std::equal_to<std::string>,
                       std::allocator<std::pair<std::string, int >>,
                       30 , true > map2;
                       
    map2[ " a " ] = 1 ;
    map2[ " b " ] = 2 ;
    
    // {a, 1} {b, 2}
    for ( const auto & key_value : map2) {
        std::cout << " { " << key_value. first << " , " << key_value. second << " } " << std::endl;
    }
    
    
    
    
    tsl::hopscotch_set< int > set;
    set. insert ({ 1 , 9 , 0 });
    set. insert ({ 2 , - 1 , 9 });
    
    // {0} {1} {2} {9} {-1}
    for ( const auto & key : set) {
        std::cout << " { " << key << " } " << std::endl;
    }
}

異構查找

異質重載允許使用Key以外的其他類型進行查找和擦除操作，只要所使用的類型是可散列的並且與Key具有可比性。

要啟動tsl::hopscotch_map/set中的異構重載，qualified-id KeyEqual::is_transparent必須有效。它的工作方式與std::map::find相同。您可以使用std::equal_to<>或定義自己的函數物件。

KeyEqual和Hash都需要能夠處理不同的型別。

# include < functional >
# include < iostream >
# include < string >
# include < tsl/hopscotch_map.h >


struct employee {
    employee ( int id, std::string name) : m_id(id), m_name(std::move(name)) {
    }
    
    // Either we include the comparators in the class and we use `std::equal_to<>`...
    friend bool operator ==( const employee& empl, int empl_id) {
        return empl. m_id == empl_id;
    }
    
    friend bool operator ==( int empl_id, const employee& empl) {
        return empl_id == empl. m_id ;
    }
    
    friend bool operator ==( const employee& empl1, const employee& empl2) {
        return empl1. m_id == empl2. m_id ;
    }
    
    
    int m_id;
    std::string m_name;
};

// ... or we implement a separate class to compare employees.
struct equal_employee {
    using is_transparent = void ;
    
    bool operator ()( const employee& empl, int empl_id) const {
        return empl. m_id == empl_id;
    }
    
    bool operator ()( int empl_id, const employee& empl) const {
        return empl_id == empl. m_id ;
    }
    
    bool operator ()( const employee& empl1, const employee& empl2) const {
        return empl1. m_id == empl2. m_id ;
    }
};

struct hash_employee {
    std:: size_t operator ()( const employee& empl) const {
        return std::hash< int >()(empl. m_id );
    }
    
    std:: size_t operator ()( int id) const {
        return std::hash< int >()(id);
    }
};


int main () {
    // Use std::equal_to<> which will automatically deduce and forward the parameters
    tsl::hopscotch_map<employee, int , hash_employee, std::equal_to<>> map; 
    map. insert ({ employee ( 1 , " John Doe " ), 2001 });
    map. insert ({ employee ( 2 , " Jane Doe " ), 2002 });
    map. insert ({ employee ( 3 , " John Smith " ), 2003 });

    // John Smith 2003
    auto it = map. find ( 3 );
    if (it != map. end ()) {
        std::cout << it-> first . m_name << " " << it-> second << std::endl;
    }

    map. erase ( 1 );



    // Use a custom KeyEqual which has an is_transparent member type
    tsl::hopscotch_map<employee, int , hash_employee, equal_employee> map2;
    map2. insert ({ employee ( 4 , " Johnny Doe " ), 2004 });

    // 2004
    std::cout << map2. at ( 4 ) << std::endl;
}

拒絕服務 (DoS) 攻擊

除了tsl::hopscotch_map和tsl::hopscotch_set之外，該庫還提供了兩個更「安全」的選項： tsl::bhopscotch_map和tsl::bhopscotch_set （都帶有它們的pg對應項）。

這兩個新增的查找和刪除的最壞情況漸近複雜度為O(log n)，插入的攤銷最壞情況為O(log n)（由於可能會出現O(n) 的重新散列而攤銷）。即使雜湊函數將所有元素映射到同一個桶，O(log n) 仍然成立。

這提供了針對哈希表拒絕服務 (DoS) 攻擊的安全性。

為了實現這一點，安全版本使用二元搜尋樹來尋找溢位元素（請參閱實作細節），因此需要元素為LessThanComparable 。需要一個額外的Compare模板參數。

# include < chrono >
# include < cstdint >
# include < iostream >
# include < tsl/hopscotch_map.h >
# include < tsl/bhopscotch_map.h >

/*
 * Poor hash function which always returns 1 to simulate
 * a Deny of Service attack.
 */
struct dos_attack_simulation_hash {
    std:: size_t operator ()( int id) const {
        return 1 ;
    }
};

int main () {
    /*
     * Slow due to the hash function, insertions are done in O(n).
     */
    tsl::hopscotch_map< int , int , dos_attack_simulation_hash> map;
    
    auto start = std::chrono::high_resolution_clock::now ();
    for ( int i= 0 ; i < 10000 ; i++) {
        map. insert ({i, 0 });
    }
    auto end = std::chrono::high_resolution_clock::now ();
    
    // 110 ms
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end-start);
    std::cout << duration. count () << " ms " << std::endl;
    
    
    
    
    /*
     * Faster. Even with the poor hash function, insertions end-up to
     * be O(log n) in average (and O(n) when a rehash occurs).
     */
    tsl::bhopscotch_map< int , int , dos_attack_simulation_hash> map_secure;
    
    start = std::chrono::high_resolution_clock::now ();
    for ( int i= 0 ; i < 10000 ; i++) {
        map_secure. insert ({i, 0 });
    }
    end = std::chrono::high_resolution_clock::now ();
    
    // 2 ms
    duration = std::chrono::duration_cast<std::chrono::milliseconds>(end-start);
    std::cout << duration. count () << " ms " << std::endl;
}