C++ pb_ds 食用教程

Drifty

2024-10-30 21:39:46

Tech. & Eng.

Preface

某次模拟赛，一道题用 map 过不了（正解也不是 map 之类的题）然而有人开了一个 gp_hash_table 过了，让 Drifty 大为震撼，而网上已有的资料都很杂乱，于是就有了这篇文章。

pb_ds 库全称 Policy-Based Data Structures。是 Mingw-G++ 中编译器封装的库，并不是原生 c++14 标准的内容。

在 NOI 系列活动中使用 pb_ds 库的合规性是有文件上的依据的，赛场上可以放心用（~~虽然在 windows 下用 TDM 9.2.0 过不了编译~~）。

其中封装了四种容器：平衡树，字典树，哈希表，和堆。常数很小，在此做下整理。本篇着重介绍哈希（感觉剩下的都很鸡肋），trie 和 priority_queue 不做介绍（因为没什么用）。

特别地下文所有代码环境均为 Windows 10 Mingw64 13.0.0，使用 g++.exe 编译，编译参数为 -std=c++14 -static-libgcc，均可通过编译。

Introduction

pb_ds 的使用和其他 STL 相同，也需要引入头文件。以下是所需的头文件。

#include <ext/pb_ds/assoc_container.hpp> //关联容器，必加
#include <ext/pb_ds/hash_policy.hpp>     //哈希表
#include <ext/pb_ds/tree_policy.hpp>     //平衡树
#include <ext/pb_ds/trie_policy.hpp>     //字典树
#include <ext/pb_ds/priority_queue.hpp>  //优先队列

什么？背不下来？巧了，pb_ds 也有万能头：

#include <bits/extc++.h>

你可能会注意到 pb_ds 中的 priority_queue 跟 C++ STL 中的 priority_queue 重名了，实际上 pb_ds 中的容器和算法都定义在以下两个命名空间里：

using namespace __gnu_cxx;
using namespace __gnu_pbds;

于是就避免了重名的问题。

hash_table

在 pb_ds 中，有两种哈希表：

// 以下 T1,T2 代表两种类型。
__gnu_pbds::gp_hash_table <T1, T2> H1;
// gp 探查法解决哈希冲突，更快，一般我们都用这个，下文的哈希表也都指这个。
__gnu_pbds::cc_hash_table <T1, T2> H2;
// cc 拉链法解决哈希冲突，较慢，比 unordered_map 还慢。

两种哈希的原型是一样的，下面给出 gp_hash_table 的原型：

template <typename Key,
   typename Mapped,
   typename Hash_Fn = typename detail::default_hash_fn<Key>::type,
   typename Eq_Fn = typename detail::default_eq_fn<Key>::type,
   typename Comb_Probe_Fn = detail::default_comb_hash_fn::type,
   typename Probe_Fn = typename detail::default_probe_fn<Comb_Probe_Fn>::type,
   typename Resize_Policy = typename detail::default_resize_policy<Comb_Probe_Fn>::type,
   bool Store_Hash = detail::default_store_hash,
   typename _Alloc = std::allocator<char> >
  class gp_hash_table;

看不懂没关系，你只需要知道在算法竞赛中直接这么构造就好：

__gnu_pbds::gp_hash_table <T1, T2> H1;
__gnu_pbds::cc_hash_table <T1, T2> H1;

其实直接当没有 emplace(),cbegin(),cend() 的 unordered_map 用就好了。

个人认为这是 pb_ds 中最好用的容器。常数很小，且速度吊打 map 和 unordered_map。这时候就有人会问了，同样是哈希，它和 unordered_map 不同在哪里？以下是它与 unordered_map 的对比：

空间复杂度略大。
使用了更严格的哈希函数和更好的哈希冲突解决方案，更难被 hack。

Attention

事实上，虽然gp_hash_table很快且很严格，但是在由于其固定的 hash 码，在面对 CodeForces 更严格的数据仍可能会被卡（但目前笔者并没有见到这样的题目）。

以下给出一种防止被卡的方法：

#include <ctime>
#include <ext/pb_ds/assoc_container.hpp>
const int RANDOM = time(NULL);
struct MyHash {int operator() (int x) const {return x ^ RANDOM;}};
__gnu_pbds::gp_hash_table <int, int, MyHash> Table;

其原理就是通过以时间生成 hash 码来解决固定 hash 码会被卡的问题。实际上不加这个基本上已经很难卡了。为了保险可以加一下。

Tricks

pb_ds 的哈希也使用了 std::tr1::hash 然而它并没有支持诸如 pair 之类的类型的 hash 因此导致 gp_hash_table 不能以 pair 作为 key 值，这时候，我们就需要利用显化的 template 重载下 hash：

#include <vector>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/hash_policy.hpp>
template <> struct std::tr1::hash <std::pair <int, int> > {
    size_t operator() (std::pair <int, int> x) const {
        return x.first ^ x.second; // 你自定义的 hash 函数。
    }
};
__gnu_pbds::gp_hash_table <std::pair <int, int>, int> Table;
// 此时可以通过编译。

理论上有严格的哈希冲突解决方案，因此你的 hash 函数怎么写都不会错，然而为了效率还是要写尽量写难冲突的 hash 函数。

或者你也可以写的更通用一些：

#include <vector>
#include <string>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/hash_policy.hpp>
template <class T1, class T2> 
struct std::tr1::hash <std::pair <T1, T2> > {
    size_t operator() (std::pair <T1, T2> x) const {
        std::tr1::hash <T1> H1; std::tr1::hash <T2> H2;
        return H1(x.first) ^ H2(x.second); // 你自定义的 hash 函数。
    }
};

此时，像这样的代码就可以通过编译了：

using namespace std;
using namespace __gnu_pbds;
gp_hash_table <pair <string, pair <int, int> >, int> Table;

于是它就跟 map 一样好用而且还快的要命了。

Examples

使用 gp_hash_table 完成【P2580 于是他错误的点名开始了】（185\text{ ms}, \text{record}）Code：

#include <bits/stdc++.h>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/hash_policy.hpp>
__gnu_pbds::gp_hash_table <std::string, short> mp;
std::string s; int n, m;
using namespace std;
const int MENTIONED = 1, REPEATED = 2;
int main() {
    ios::sync_with_stdio(false);
    cin.tie(NULL), cout.tie(NULL);
    cin >> n; while (n --) cin >> s, mp[s] = MENTIONED; cin >> m;
    while (m --) {
        cin >> s;
        if (mp[s] == MENTIONED) cout << "OK\n", mp[s] = REPEATED;
        else if (mp[s] == REPEATED) cout << "REPEAT\n";
        else cout << "WRONG\n";
    }
    return 0;
}

tree

它的使用方式和定义在这里不详细介绍，在【平衡树 - OI Wiki】中已经给出了详解。下面主要给出它的 Examples & Tricks。

Tricks

注意到在构造的时候，我们在最后一个模版参数中传入了一个类型 tree_order_statistics_node_update 这就是一个维护节点的类型，我们可以对其进行自定义：

template <class Node_CItr, class Node_Itr, class Cmp_Fn, class _Alloc>
struct Updater {
    using int metadata_type;
    void operator () (Node_Itr it, Node_CItr end_it) {
        // code... 更新函数
    }
    // virtual <type> <func-name>() const {} // 自定义功能，可以体现在 tree 中
};

最后在构造的时候把 Updater 传进去就好了。

个人感觉这个功能很鸡肋，一来常数较大，二来不如直接写一个 Treap，实际没什么用。tree 这个东西，本来也就是用在减少码量的，灵活性一般，使用也不好背。

Examples

以下给出【P3369 【模板】普通平衡树】的 Code（\text{record}）：

#include <iostream>
#include <ext/pb_ds/assoc_container.hpp>
#include <ext/pb_ds/tree_policy.hpp>
using namespace std;
using namespace __gnu_pbds;
using i64 = long long;
using les = less <i64>;
using ntp = null_type;
tree <i64, ntp, les, rb_tree_tag, 
    tree_order_statistics_node_update> RbTr;
int n, op; i64 k, ans;
int main() {
    ios::sync_with_stdio(false);
    cin.tie(NULL), cout.tie(NULL);
    cin >> n;
    auto PrintAns = []() {cout << (ans >> 20) << '\n';};
    for (int i = 1; i <= n; i ++) {
        cin >> op >> k;
        if (op == 1) RbTr.insert((k << 20) + i);
        if (op == 2) RbTr.erase(RbTr.lower_bound(k << 20));
        if (op == 3) cout << RbTr.order_of_key(k << 20) + 1 << '\n';
        if (op == 4) ans = * RbTr.find_by_order(k - 1), PrintAns();
        if (op == 5) ans = * -- RbTr.lower_bound(k << 20), PrintAns();
        if (op == 6) ans = * RbTr.upper_bound((k << 20) + n), PrintAns();
    }
    return 0;
}