Pubby (pubby8@gmail.com) Aug 15 2016
Flat Containers (P0038R0) by Sean Middleditch gave a good overview of what flat containers are and how might they be designed. This paper aims to answer the questions posed in Sean's paper, focusing-in on a specific design for the standard.
A coded implementation can be found at https://github.com/pubby/flat.
<flat_map>
<flat_set>
std::flat_map
std::flat_multimap
std::flat_set
std::flat_multiset
std::vector_map
std::vector_multimap
std::vector_set
std::vector_multiset
binary_find
last
.<algorithm>
.has
map.has(key)
returns a pointer to key's mapped value if it exists,
otherwise returns null.insert
(delayed sort)
map.insert(first, last, std::delayed_sort)
performs insertion with a
delayed sort optimization.container
member.container
iterator::underlying
Any vector-like sequence container can be used by flat containers:
std::flat_set<std::vector<int>> vecset;
std::flat_map<std::vector<std::pair<int, double>>> vecmap;
std::flat_set<std::deque<char>> deqset;
// These next containers use the stack to allocate - Neat!
std::flat_set<boost::container::static_vector<double, 64>> staticset;
std::flat_set<boost::container::small_vector<std::string, 32>> smallset;
// Custom containers work great too.
std::flat_set<GameCompany::GameVector<GameInt>> gameSet;
For convenience, the typedefs vector_set
and vector_map
are provided as
aliases for std::vector
flat containers.
std::vector_set<int> vecset;
std::vector_map<int, double> vecmap;
Use these typedefs all the time.
map
and set
There's nothing to learn because you already know how to use flat containers.
std::vector_map<std::string, int> map;
map.insert(std::make_pair("Hello World", 42));
map.insert(std::make_pair("insert is great", 10));
map.erase("insert is great");
map.emplace("emplace is better", 11);
for(auto const& pair : map)
std::cout << pair.first << ": " << pair.second << std::endl;
map.clear();
Use container
to gain access to the underlying container.
This works because flat containers are just light wrappers around their
contained containers.
std::vector_set<int> set;
// vector_set does not have a reserve member, but that's not a problem with 'container'.
set.container.reserve(1000);
for(int i = 0; i != 1000; ++i)
set.insert(i + i/2);
// operator[] on 'container' is convenient to have.
for(int i = 0; i != set.container.size(); ++i)
set.container[i] *= 2;
There are rules regarding container
, but we'll get into those later.
flat_map
exists.There are three cases where flat containers beat out everything else, bar none:
Note that "searching" is not include on the list. In general, the O(1) average-case performance of hash tables will always beat flat containers when it comes to searching, insertion, and removal. Flat containers do have their advantages over hash tables when it comes to worst-case performance, but rarely are they optimal.
The graph below shows the performance of associate containers at sizes less than 256. Note that hash tables perform the best and linear search (displayed as 'unsorted') performs the worst. The two flat containers are displayed in black and cyan.
(Side note: Linear search is really, really terrible and should stop being recommended to newbies as a faster alternative to associative containers. It's not a worthwhile optimization to make; it's just stupid.)
"After extensive testing and tuning on a wide variety of modern hardware, we arrive at the conclusion that, for small values of n, sorted order, combined with a good implementation of binary search is best. For larger values of n, we arrive at the surprising conclusion that the Eytzinger [breadth-first] layout is usually the fastest."
That quotation is from the abstract of an absolutely fantastic paper by Pat Morin and Paul-Virak Khuong called Array Layouts for Comparison-Based Searching, which details the different representations of flat sorted arrays and their influence upon performance. The paper is based on a series of binary search benchmarks written by Pat and ran by others on various types of hardware. Graphs made from these benchmarks are availible on Pat Morin's website.
Here is one such graph which shows the performance of 32-bit data on an Intel i7-4790K:
As you can see, branch-free binary search performs best for sizes that fit in the cache, but Eytzinger (breadth-first) is the clear winner past that.
It's all due to bandwidth, prefetching, and speculative loads. It is not due to "having fewer cache misses".
Two key properties to take note of:
The Eytzinger search algorithm is fast because it loads memory it might need several iterations in advance. Property #1 allows this prefetching to be efficient; all possible grandchildren of a given node exist in just one or two cachelines. Property #2 provides the speedup; a small number of prefetches (four) can be outgoing at once without increases in latency.
Unfortunately, the Eytzinger layout is not useful as a general-purpose container. Despite being fast at searches, Eytzinger's other map-like operations slow and complicated.
Full implementations of the Eytzinger layout do not exist, but performance numbers can be estimated from a benchmark done by Joaquín M López Muñoz. This benchmark compares the in-order traversal time of Eytzinger layouts (referred to as level-order) to sorted vectors. The Eytzinger layout took five times as long when doing this traversal.
Assumptions can be made about the speed of insertion, removal, and in-order construction based on these results. Since Eytzinger layouts are perfectly balanced binary trees, such operations must do additional work to re-balance the tree. This requires at least O(N) complexity for traversing the tree, and that implies a constant factor which is at least five times that of sorted vectors.
Sorted vectors are strongly preferred from a standard library perspective. They offer good - but not optimal - performance accross all of their operations, and can be implemented using pre-existing library functions. Sorted vectors are the best choice for this proposal.
There are two ways of storing sorted key/value pairs:
The graphs below show the performance of searching maps with these two representations.
(The y-axis scale of these graphs is linear and starts at time=0)
When the size of the key is equal to the size of the mapped value, both representations will always have the same performance. However, as the size of the mapped value grows, performance shifts to favoring separate vectors instead of single interleaved ones.
In reality, large sized mapped values are a rare sight. It is often more desirable for the map to hold pointers that point to large objects than it is for the map to hold large objects directly. This enables the container to have fast insertion and removal without problems of reference invalidation.
The interleaved reprentation uses std::pair
value types and is thus
is strongly preferred from a design perspective. Interleaved vectors of
pairs mesh well with the existing standard library and containers.
In addition, the container adaptor design of flat containers strongly
favors single containers.
Thus, the proposed flat container design uses single, interleaved vectors of pairs.
Take a look at the code below. Will it compile? What will it print when it runs?
boost::container::flat_map<int, int> map;
map.insert(std::make_pair(0, 0));
map.insert(std::make_pair(1, 0));
map.begin()->first = 2;
std::cout << map.count(0) << ',' << map.count(1) << ',' << map.count(2);
This turns out to be undefined behavior. The most likely result is:
0,0,0
Boost's flat_map
implementation uses a single sorted vector of
pairs. Unlike std::map
, the keys of flat_map
's pairs are not const
;
they can be modified. Modifying the keys causes the vector to become unsorted,
and an unsorted vector causes flat_map
to break.
This problem is not unique to Boost. Loki, EASTL, and every other popular implementation have this problem too.
There are possibilities to design flat container classes which are impossible to make unsorted. All such designs are based around keeping key values immutable.
vector<bool>
.std::pair
iterators.
All iterators of flat containers can be made const iterators by default. This means that references to const pairs are returned when dereferencing flat container iterators.
fc::vector_map<int, int> map;
*map.first() = std::make_pair(1, 2); // doesn't compile
map.first()->first = 3; // doesn't compile
map.first()->second = 4; // doesn't compile
There are two ways of modifying mapped values.
First, one can use the has
function. has
works just
like at
and find
, but returns a pointer value instead of an iterator.
if(int* mapped_value = map.has(4))
*mapped_value = "has is handy";
Whenever possible, use has
.
Second, one can use an iterator public data member called underlying
.
underlying
is the iterator equivalent of container
and grants access
to iterators belonging to the adapted container. These 'underlying' container
iterators have their usual const semantics and can be used to modify
both keys and values.
map.first().underlying->second = "underlying is useful";
Like container
and const_cast
, use of underlying
is an explicit
declaration to others that you know what you're doing and you can verify
that it is safe. This explicitness is what makes underlying
the safer
alternative to Boost-style iterators.
Yes. Going all-out to prevent unsorted states leads to verbose, hard to use container designs that are closed off from user optimizations.
It more practical to be "safe by default" than "safe all the time".
There are only three things that can lead to unsorted states:
container
underlying
Avoid all three and flat containers are guaranteed to remain sorted.
Always remember that std::sort
followed by std::unique
can be used to
turn any unsorted state into a sorted state.
Permitting unsorted states in flat containers requires the standard to address and define this behavior.
lower_bound
, upper_bound
, and equal_range
.
e
of [begin, end)
shall be
partitioned with respect to the expression
key_comp(e, value)
.The rules regarding duplicate keys are less restrictive than the rules regarding sortedness.
The intention of flat_map
and flat_set
is to
hold unique keys, but this is not a hard requirement of the standard.
The behavior of member functions on duplicate-key containers is well defined
but implementation specific. For example, using find
on a container
with duplicate keys will return an iterator to any one of the matching
values. This matches the preexisting behavior of std::multimap
.
The design of priority_queue
has influenced the design of flat containers
and the similarities between the two are apparent.
std::priority_queue
deque
.make_heap
, push_heap
,
and pop_heap
.std::flat_map
vector
.lower_bound
, upper_bound
, and equal_range
.The ability to use static_vector
and small_vector
with flat containers
is a major selling point of this proposal. The container adaptor interface
does not add complexity or performance penalties to the library's
implementation.
Users may want to use their own reference proxy containers with flat containers. For example, consider a container which attempts to implement the "two separate vectors" idea discussed in the Internal algorithms section:
template<typename K, typename T>
struct two_vectors
{
class iterator { /* ... Reference proxy implementation ... */ };
/* ... Sequence container member functions ... */
std::vector<K>; keys;
std::vector<T>; mapped;
};
std::flat_map<int, double, two_vectors<int, double>> map;
two_vectors
does not store pair
value types
but it can be faked to do so with reference proxies. Note that
reference proxies can apply const
to their references
to prevent users from modifying keys.
If possible, reference proxy containers should be permitted by the standard for use in flat containers.
Users may want to use custom pair types. For example, consider this
packed_pair
:
template<typename A, typename B>
struct packed_pair
{
using first_type = A;
using second_type = B;
A first;
B second;
} __attribute__((packed));
This type can be used as follows:
std::flat_map<char, int, std::vector<packed_pair<char, int>>> map;
If possible, custom value types should be permitted by the standard for use in flat containers.
container
data memberThe standard library's current container adaptors are not well-loved,
and this is largely due to them having less functionality than the
containers that they adapt. The fact is, there are very few cases where
stack
and queue
are the ideal container choice. Far more often it makes
sense to use vector
and deque
directly and gain access to all the extra
functionality they provide. The current container adaptors are practically
useless.
With a desire to make flat containers useful, the natural solution
is to make their adapted container member publicly visible. This comes
at no loss to safety; unsortedness would exist in the design regardless of
whether or not container
was public. There are no synchronization issues
to worry about as container
can be the one and only data member of the
flat container classes. And by making container
a public data member,
several advantages are to be had.
Associative containers often follow this access pattern:
An efficient way of storing multiple values into a flat container is to insert the values at the end and sort the container afterwards. For cases involving a single range, the flat container's insert function can be implemented to have this behavior:
std::flat_multiset<int> set;
set.insert(some_range.begin(), some_range.end(), std::delay_sort);
// Note: The constructor has overloads too.
// The code above could be rewritten like this:
std::flat_multiset<int> set2(some_range.begin(), some_range.end(), std::delay_sort);
Unfortunately, not all real-world situations have such clean-cut insertion
ranges, and is it not always desirable for the container to sort itself after
every insertion. For cases such as these, the public
container
data member offers a solution:
std::flat_multiset<int> set;
while(cpu_temperature() < 100)
set.container.push_back(cpu_temperature());
std::sort(set.begin(), set.end());
This technique is easily adapted to whatever access patterns users have.
Functions such as reserve
and max_size
are not part of flat
container interfaces. This is not a problem with a public container
data member.
map.container.reserve(1000);
Flat containers are container adaptors and so any type of vector-like
container can be used. Access to container
allows users of
custom vector implementations to use whatever member functions they need.
std::flat_map<int, int, my_vector<std::pair<int, int>>> map;
map.container.set_node_size(10);
map.container.debug_diagnostics = true;
my_library::hook_container(&map.container);
Occasionally it may be desirable to iterate a container using
indexes rather than iterators. While it is possible to do this using the
operator[]
of iterators, using container
for this is far easier to grasp.
for(int i = 0; i < map.size(); ++i) std::cout << i << ':' << map.container[i].second << std::endl;
The container
member can be used to copy and move directly.
std::flat_set<int> set;
std::vector<int> vec = std::move(set.container);
set.clear();
set.container = std::move(vec);
vec.clear();
This is far cleaner than adding overloads to operator=
.
container
?Nope!
Unsorted states exist in the design for iterators, not container
. Removing
container
would not remove unsorted states. The container
member is the
one and only member of flat container classes. There are no variable
out-of-sync issues to worry about.