hopscotch map下载 - hopscotch map源码下载

使用跳房子哈希法的快速哈希映射和哈希集的 C++ 实现

hopscotch-map 库是快速哈希映射和哈希集的 C++ 实现，使用开放寻址和跳房子哈希来解决冲突。它是一种缓存友好的数据结构，在大多数情况下提供比std::unordered_map更好的性能，并且与google::dense_hash_map非常相似，同时使用更少的内存并提供更多功能。

该库提供了以下主要类： tsl::hopscotch_map 、 tsl::hopscotch_set 、 tsl::hopscotch_pg_map和tsl::hopscotch_pg_set 。前两个速度更快，并且使用 2 的幂增长策略，后两个则使用素数增长策略，并且能够更好地应对较差的哈希函数。如果散列的低位有可能重复模式（例如，您使用身份散列函数存储指针），请使用主要版本。有关详细信息，请参阅增长政策。

除了这些类之外，该库还提供tsl::bhopscotch_map 、 tsl::bhopscotch_set 、 tsl::bhopscotch_pg_map和tsl::bhopscotch_pg_set 。这些类对键有一个额外的要求，它必须是LessThanComparable ，但它们提供了更好的渐近上限，请参阅示例中的详细信息。尽管如此，如果您没有特定要求（散列 DoS 攻击的风险）， tsl::hopscotch_map和tsl::hopscotch_set在大多数情况下应该足够了，并且应该是您的默认选择，因为它们通常表现更好。

跳房子哈希的概述和一些实现细节可以在这里找到。

可以在此处找到tsl::hopscotch_map与其他哈希映射的基准测试。此页面还提供了一些关于您应该为您的用例尝试哪种哈希表结构的建议（如果您对tsl命名空间中的多个哈希表实现有点迷失，那么这很有用）。

主要特点

仅标头库，只需将包含目录添加到包含路径中即可开始。如果您使用 CMake，还可以使用 CMakeLists.txt 中导出的tsl::hopscotch_map目标。
快速哈希表，请参阅一些数字的基准测试。
支持仅移动和非默认可构造键/值。
支持异构查找，允许使用与Key不同的类型的find （例如，如果您有一个使用std::unique_ptr<foo>作为键的映射，则可以使用foo*或std::uintptr_t作为键参数find而不构造std::unique_ptr<foo> ，请参阅示例）。
无需从键中保留任何哨兵值。
如果哈希或键相等函数的计算成本很高，则可以在插入时存储哈希值，以便更快地重新哈希和查找（请参阅 StoreHash 模板参数）。
如果在查找之前知道哈希值，则可以将其作为参数传递以加快查找速度（请参阅 API 中的precalculated_hash参数）。
tsl::bhopscotch_map和tsl::bhopscotch_set在查找和删除方面提供 O(log n) 的最坏情况，使这些类能够抵抗哈希表拒绝服务 (DoS) 攻击（请参阅示例中的详细信息）。
该库可以在禁用异常的情况下使用（通过 Clang 和 GCC 上的-fno-exceptions选项，在 MSVC 上没有/EH选项或简单地通过定义TSL_NO_EXCEPTIONS ）。 std::terminate用于在禁用异常时替换throw指令。
API 与std::unordered_map和std::unordered_set非常相似。

与`std::unordered_map`相比的差异

tsl::hopscotch_map尝试拥有类似于std::unordered_map接口，但存在一些差异。

插入时迭代器失效的行为方式不同。一般来说，修改哈希表的任何操作（ erase除外）都会使所有迭代器失效（详细信息请参阅 API）。
对映射中键或值的引用和指针的失效方式与插入时对这些键值的迭代器相同。
对于迭代器， operator*()和operator->()返回一个引用和一个指向const std::pair<Key, T>而不是std::pair<const Key, T>的指针，使得值T不可修改。要修改值，您必须调用迭代器的value()方法来获取可变引用。例子：

tsl::hopscotch_map< int , int > map = {{ 1 , 1 }, { 2 , 1 }, { 3 , 1 }};
for ( auto it = map.begin(); it != map.end(); ++it) {
    // it->second = 2; // Illegal
    it. value () = 2 ; // Ok
}

仅移动类型必须具有无抛出移动构造函数（使用开放寻址，如果移动构造函数可能抛出，则不可能在重新哈希时保持强异常保证）。
不支持某些与存储桶相关的方法（例如bucket_size ， bucket ，...）。

这些差异也适用于std::unordered_set和tsl::hopscotch_set之间。

线程安全和异常保证与std::unordered_map/set相同（即可以有多个读取器而没有写入器）。

增长政策

该库通过GrowthPolicy模板参数支持多种增长策略。图书馆提供了三项政策，但如果需要，您可以轻松实施自己的政策。

tsl::hh::两个增长政策的力量。 tsl::(b)hopscotch_map/set使用的默认策略。该策略将哈希表的存储桶数组的大小保持为 2 的幂。此约束允许策略避免使用慢模运算将哈希映射到存储桶，而不是hash % 2 ⁿ ，而是使用hash & (2 ⁿ - 1) （请参阅快速模运算）。速度很快，但这可能会导致与较差的哈希函数发生大量冲突，因为 2 的幂的模最终仅掩盖了最高有效位。
tsl::hh::prime_growth_policy。 tsl::(b)hopscotch_pg_map/set使用的默认策略。该策略将哈希表的桶数组的大小保持为素数。将哈希映射到存储桶时，使用素数作为模将导致哈希值在存储桶中更好的分布，即使哈希函数很差。为了允许编译器优化模运算，该策略使用具有常数素数模的查找表（有关详细信息，请参阅 API）。比tsl::hh::power_of_two_growth_policy慢，但更安全。
tsl::hh::mod_growth_policy。该策略通过参数中传递的可定制增长因子来增长地图。然后它只使用模运算符将哈希映射到存储桶。速度较慢但更灵活。

如果您遇到性能不佳的情况，请检查overflow_size() ，如果它不为零，则可能会出现很多哈希冲突。要么更改哈希函数以获得更统一的值，要么尝试其他增长策略（主要是tsl::hh::prime_growth_policy ）。不幸的是，有时很难保护自己免受冲突（例如，对哈希图的 DoS 攻击）。如果需要，还请检查tsl::bhopscotch_map/set ，它在查找时提供 O(log n) 而不是 O(n) 的最坏情况，请参阅示例中的详细信息。

要实现您自己的策略，您必须实现以下接口。

 struct custom_policy {
    // Called on hash table construction and rehash, min_bucket_count_in_out is the minimum buckets
    // that the hash table needs. The policy can change it to a higher number of buckets if needed 
    // and the hash table will use this value as bucket count. If 0 bucket is asked, then the value
    // must stay at 0.
    explicit custom_policy (std:: size_t & min_bucket_count_in_out);
    
    // Return the bucket [0, bucket_count()) to which the hash belongs. 
    // If bucket_count() is 0, it must always return 0.
    std:: size_t bucket_for_hash (std:: size_t hash) const noexcept ;
    
    // Return the number of buckets that should be used on next growth
    std:: size_t next_bucket_count () const ;
    
    // Maximum number of buckets supported by the policy
    std:: size_t max_bucket_count () const ;
    
    // Reset the growth policy as if the policy was created with a bucket count of 0.
    // After a clear, the policy must always return 0 when bucket_for_hash() is called.
    void clear () noexcept ;
}

安装

要使用 hopscotch-map，只需将包含目录添加到包含路径中即可。它是一个仅包含头文件的库。

如果您使用 CMake，还可以将 CMakeLists.txt 中的tsl::hopscotch_map导出目标与target_link_libraries结合使用。

 # Example where the hopscotch-map project is stored in a third-party directory
add_subdirectory (third-party/hopscotch-map)
target_link_libraries (your_target PRIVATE tsl::hopscotch_map)

如果项目已经通过make install安装，您还可以使用find_package(tsl-hopscotch-map REQUIRED)而不是add_subdirectory 。

该代码应适用于任何符合 C++17 标准的编译器。

要运行测试，您将需要 Boost Test 库和 CMake。

git clone https://github.com/Tessil/hopscotch-map.git
cd hopscotch-map/tests
mkdir build
cd build
cmake ..
cmake --build .
./tsl_hopscotch_map_tests

用法

该 API 可以在这里找到。

所有方法尚未记录，但它们复制std::unordered_map和std::unordered_set中的行为，除非另有指定。

例子

# include < cstdint >
# include < iostream >
# include < string >
# include < tsl/hopscotch_map.h >
# include < tsl/hopscotch_set.h >

int main () {
    tsl::hopscotch_map<std::string, int > map = {{ " a " , 1 }, { " b " , 2 }};
    map[ " c " ] = 3 ;
    map[ " d " ] = 4 ;
    
    map. insert ({ " e " , 5 });
    map. erase ( " b " );
    
    for ( auto it = map. begin (); it != map. end (); ++it) {
        // it->second += 2; // Not valid.
        it. value () += 2 ;
    }
    
    // {d, 6} {a, 3} {e, 7} {c, 5}
    for ( const auto & key_value : map) {
        std::cout << " { " << key_value. first << " , " << key_value. second << " } " << std::endl;
    }
    
        
    if (map. find ( " a " ) != map. end ()) {
        std::cout << " Found " a " . " << std::endl;
    }
    
    const std:: size_t precalculated_hash = std::hash<std::string>()( " a " );
    // If we already know the hash beforehand, we can pass it in parameter to speed-up lookups.
    if (map. find ( " a " , precalculated_hash) != map. end ()) {
        std::cout << " Found " a " with hash " << precalculated_hash << " . " << std::endl;
    }
    
    
    /*
     * Calculating the hash and comparing two std::string may be slow. 
     * We can store the hash of each std::string in the hash map to make 
     * the inserts and lookups faster by setting StoreHash to true.
     */ 
    tsl::hopscotch_map<std::string, int , std::hash<std::string>, 
                       std::equal_to<std::string>,
                       std::allocator<std::pair<std::string, int >>,
                       30 , true > map2;
                       
    map2[ " a " ] = 1 ;
    map2[ " b " ] = 2 ;
    
    // {a, 1} {b, 2}
    for ( const auto & key_value : map2) {
        std::cout << " { " << key_value. first << " , " << key_value. second << " } " << std::endl;
    }
    
    
    
    
    tsl::hopscotch_set< int > set;
    set. insert ({ 1 , 9 , 0 });
    set. insert ({ 2 , - 1 , 9 });
    
    // {0} {1} {2} {9} {-1}
    for ( const auto & key : set) {
        std::cout << " { " << key << " } " << std::endl;
    }
}

异构查找

异构重载允许使用Key以外的其他类型进行查找和擦除操作，只要所使用的类型是可散列的并且与Key具有可比性。

要激活tsl::hopscotch_map/set中的异构重载，qualified-id KeyEqual::is_transparent必须有效。它的工作方式与std::map::find相同。您可以使用std::equal_to<>或定义自己的函数对象。

KeyEqual和Hash都需要能够处理不同的类型。

# include < functional >
# include < iostream >
# include < string >
# include < tsl/hopscotch_map.h >


struct employee {
    employee ( int id, std::string name) : m_id(id), m_name(std::move(name)) {
    }
    
    // Either we include the comparators in the class and we use `std::equal_to<>`...
    friend bool operator ==( const employee& empl, int empl_id) {
        return empl. m_id == empl_id;
    }
    
    friend bool operator ==( int empl_id, const employee& empl) {
        return empl_id == empl. m_id ;
    }
    
    friend bool operator ==( const employee& empl1, const employee& empl2) {
        return empl1. m_id == empl2. m_id ;
    }
    
    
    int m_id;
    std::string m_name;
};

// ... or we implement a separate class to compare employees.
struct equal_employee {
    using is_transparent = void ;
    
    bool operator ()( const employee& empl, int empl_id) const {
        return empl. m_id == empl_id;
    }
    
    bool operator ()( int empl_id, const employee& empl) const {
        return empl_id == empl. m_id ;
    }
    
    bool operator ()( const employee& empl1, const employee& empl2) const {
        return empl1. m_id == empl2. m_id ;
    }
};

struct hash_employee {
    std:: size_t operator ()( const employee& empl) const {
        return std::hash< int >()(empl. m_id );
    }
    
    std:: size_t operator ()( int id) const {
        return std::hash< int >()(id);
    }
};


int main () {
    // Use std::equal_to<> which will automatically deduce and forward the parameters
    tsl::hopscotch_map<employee, int , hash_employee, std::equal_to<>> map; 
    map. insert ({ employee ( 1 , " John Doe " ), 2001 });
    map. insert ({ employee ( 2 , " Jane Doe " ), 2002 });
    map. insert ({ employee ( 3 , " John Smith " ), 2003 });

    // John Smith 2003
    auto it = map. find ( 3 );
    if (it != map. end ()) {
        std::cout << it-> first . m_name << " " << it-> second << std::endl;
    }

    map. erase ( 1 );



    // Use a custom KeyEqual which has an is_transparent member type
    tsl::hopscotch_map<employee, int , hash_employee, equal_employee> map2;
    map2. insert ({ employee ( 4 , " Johnny Doe " ), 2004 });

    // 2004
    std::cout << map2. at ( 4 ) << std::endl;
}

拒绝服务 (DoS) 攻击

除了tsl::hopscotch_map和tsl::hopscotch_set之外，该库还提供了两个更“安全”的选项： tsl::bhopscotch_map和tsl::bhopscotch_set （都带有它们的pg对应项）。

这两个添加的查找和删除的最坏情况渐近复杂度为 O(log n)，插入的摊销最坏情况为 O(log n)（由于可能会在 O(n) 中重新散列而摊销）。即使哈希函数将所有元素映射到同一个桶，O(log n) 仍然成立。

这提供了针对哈希表拒绝服务 (DoS) 攻击的安全性。

为了实现这一点，安全版本使用二叉搜索树来查找溢出元素（请参阅实现细节），因此需要元素为LessThanComparable 。需要一个额外的Compare模板参数。

# include < chrono >
# include < cstdint >
# include < iostream >
# include < tsl/hopscotch_map.h >
# include < tsl/bhopscotch_map.h >

/*
 * Poor hash function which always returns 1 to simulate
 * a Deny of Service attack.
 */
struct dos_attack_simulation_hash {
    std:: size_t operator ()( int id) const {
        return 1 ;
    }
};

int main () {
    /*
     * Slow due to the hash function, insertions are done in O(n).
     */
    tsl::hopscotch_map< int , int , dos_attack_simulation_hash> map;
    
    auto start = std::chrono::high_resolution_clock::now ();
    for ( int i= 0 ; i < 10000 ; i++) {
        map. insert ({i, 0 });
    }
    auto end = std::chrono::high_resolution_clock::now ();
    
    // 110 ms
    auto duration = std::chrono::duration_cast<std::chrono::milliseconds>(end-start);
    std::cout << duration. count () << " ms " << std::endl;
    
    
    
    
    /*
     * Faster. Even with the poor hash function, insertions end-up to
     * be O(log n) in average (and O(n) when a rehash occurs).
     */
    tsl::bhopscotch_map< int , int , dos_attack_simulation_hash> map_secure;
    
    start = std::chrono::high_resolution_clock::now ();
    for ( int i= 0 ; i < 10000 ; i++) {
        map_secure. insert ({i, 0 });
    }
    end = std::chrono::high_resolution_clock::now ();
    
    // 2 ms
    duration = std::chrono::duration_cast<std::chrono::milliseconds>(end-start);
    std::cout << duration. count () << " ms " << std::endl;
}