Additional classes, macros, and functions that help to work more easily with the main vector types.
Classes | |
class | CpuId |
This class is available for x86 / AMD64 systems to read and interpret information about the CPU's capabilities. More... | |
struct | ImplementationT< Features > |
This class identifies the specific implementation Vc uses in the current translation unit in terms of a type. More... | |
class | Allocator< T > |
An allocator that uses global new and supports over-aligned types, as per [C++11 20.6.9]. More... | |
struct | AlignedBase< Alignment > |
Helper class to ensure a given alignment. More... | |
Macros | |
#define | Vc_DECLARE_ALLOCATOR(Type) |
Convenience macro to set the default allocator for a given Type to Vc::Allocator. More... | |
Typedefs | |
using | CurrentImplementation = ImplementationT< > |
Identifies the Vc implementation used in the current translation unit. More... | |
using | VectorAlignedBase = AlignedBase< Detail::max(alignof(Vector< float >), alignof(Vector< double >), alignof(Vector< ullong >), alignof(Vector< llong >), alignof(Vector< ulong >), alignof(Vector< long >), alignof(Vector< uint >), alignof(Vector< int >), alignof(Vector< ushort >), alignof(Vector< short >), alignof(Vector< uchar >), alignof(Vector< schar >))> |
Helper type to ensure suitable alignment for any Vc::Vector<T> type (using the default VectorAbi). More... | |
template<typename V > | |
using | VectorAlignedBaseT = AlignedBase< alignof(V)> |
Variant of the above type ensuring suitable alignment only for the specified vector type V . More... | |
using | MemoryAlignedBase = AlignedBase< Detail::max(Vector< float >::MemoryAlignment, Vector< double >::MemoryAlignment, Vector< ullong >::MemoryAlignment, Vector< llong >::MemoryAlignment, Vector< ulong >::MemoryAlignment, Vector< long >::MemoryAlignment, Vector< uint >::MemoryAlignment, Vector< int >::MemoryAlignment, Vector< ushort >::MemoryAlignment, Vector< short >::MemoryAlignment, Vector< uchar >::MemoryAlignment, Vector< schar >::MemoryAlignment)> |
Helper class to ensure suitable alignment for arrays of scalar objects for any Vc::Vector<T> type (using the default VectorAbi). More... | |
template<typename V > | |
using | MemoryAlignedBaseT = AlignedBase< V::MemoryAlignment > |
Variant of the above type ensuring suitable alignment only for the specified vector type V . More... | |
using | llong = long long |
long long shorthand | |
using | ullong = unsigned long long |
unsigned long long shorthand | |
using | ulong = unsigned long |
unsigned long shorthand | |
using | uint = unsigned int |
unsigned int shorthand | |
using | ushort = unsigned short |
unsigned short shorthand | |
using | uchar = unsigned char |
unsigned char shorthand | |
using | schar = signed char |
signed char shorthand | |
Enumerations | |
enum | MallocAlignment { AlignOnVector, AlignOnCacheline, AlignOnPage } |
Enum that specifies the alignment and padding restrictions to use for memory allocation with Vc::malloc. More... | |
enum | Implementation : std::uint_least32_t { ScalarImpl, SSE2Impl, SSE3Impl, SSSE3Impl, SSE41Impl, SSE42Impl, AVXImpl, AVX2Impl, MICImpl } |
Enum to identify a certain SIMD instruction set. More... | |
enum | ExtraInstructions : std::uint_least32_t { Float16cInstructions = 0x01000, Fma4Instructions = 0x02000, XopInstructions = 0x04000, PopcntInstructions = 0x08000, Sse4aInstructions = 0x10000, FmaInstructions = 0x20000, VexInstructions = 0x40000, Bmi2Instructions = 0x80000 } |
The list of available instructions is not easily described by a linear list of instruction sets. More... | |
Functions | |
const char * | versionString () |
constexpr unsigned int | versionNumber () |
template<typename V , typename Parent , typename Dimension , typename RM > | |
std::ostream & | operator<< (std::ostream &s, const Vc::MemoryBase< V, Parent, Dimension, RM > &m) |
Prints the contents of a Memory object into a stream object. More... | |
template<class InputIt , class UnaryFunction > | |
UnaryFunction | simd_for_each (InputIt first, InputIt last, UnaryFunction f) |
Vc variant of the std::for_each algorithm. More... | |
template<typename Mask , typename T > | |
enable_if< is_simd_mask< Mask >::value &&is_simd_vector< T >::value, T > | iif (const Mask &condition, const T &trueValue, const T &falseValue) |
Function to mimic the ternary operator '?:' (inline-if). More... | |
template<typename T > | |
constexpr T | iif (bool condition, const T &trueValue, const T &falseValue) |
Overload of the above for boolean conditions. More... | |
template<typename V , typename = enable_if<Traits::is_simd_vector<V>::value>> | |
std::pair< V, V > | interleave (const V &a, const V &b) |
Interleaves the entries from a and b into two vectors of the same type. More... | |
template<typename T , Vc::MallocAlignment A> | |
T * | malloc (size_t n) |
Allocates memory on the Heap with alignment and padding suitable for vectorized access. More... | |
template<typename T > | |
void | free (T *p) |
Frees memory that was allocated with Vc::malloc. More... | |
void | prefetchForOneRead (const void *addr) |
Prefetch the cacheline containing addr for a single read access. More... | |
void | prefetchForModify (const void *addr) |
Prefetch the cacheline containing addr for modification. More... | |
void | prefetchClose (const void *addr) |
Prefetch the cacheline containing addr to L1 cache. More... | |
void | prefetchMid (const void *addr) |
Prefetch the cacheline containing addr to L2 cache. More... | |
void | prefetchFar (const void *addr) |
Prefetch the cacheline containing addr to L3 cache. More... | |
template<typename V , typename T , typename Abi > | |
enable_if<(V::size()==Vector< T, Abi >::size() &&sizeof(typename V::VectorEntryType)==sizeof(typename Vector< T, Abi >::VectorEntryType) &&sizeof(V)==sizeof(Vector< T, Abi >) &&alignof(V)<=alignof(Vector< T, Abi >)), V > | reinterpret_components_cast (const Vector< T, Abi > &x) |
Constructs a new Vector object of type V from the Vector x , reinterpreting the bits of x for the new type V . More... | |
template<typename M > | |
constexpr WhereImpl::WhereMask< M > | where (const M &mask) |
Conditional assignment. More... | |
Variables | |
constexpr AlignedTag | Aligned |
Use this object for a flags parameter to request aligned loads and stores. More... | |
constexpr UnalignedTag | Unaligned |
Use this object for a flags parameter to request unaligned loads and stores. More... | |
constexpr StreamingTag | Streaming |
Use this object for a flags parameter to request streaming loads and stores. More... | |
constexpr LoadStoreFlags::LoadStoreFlags< PrefetchFlag<> > | PrefetchDefault |
Use this object for a flags parameter to request default software prefetches to be emitted. | |
constexpr VectorSpecialInitializerZero | Zero = {} |
The special object Vc::Zero can be used to construct Vector and Mask objects initialized to zero/false . | |
constexpr VectorSpecialInitializerOne | One = {} |
The special object Vc::One can be used to construct Vector and Mask objects initialized to one/true . | |
constexpr VectorSpecialInitializerIndexesFromZero | IndexesFromZero = {} |
The special object Vc::IndexesFromZero can be used to construct Vector objects initialized to values 0, 1, 2, 3, 4, ... | |
SIMD Support Feature Macros | |
#define | Vc_IMPL_XOP |
This macro is defined if the current translation unit is compiled with XOP instruction support. | |
#define | Vc_IMPL_FMA4 |
This macro is defined if the current translation unit is compiled with FMA4 instruction support. | |
#define | Vc_IMPL_F16C |
This macro is defined if the current translation unit is compiled with F16C instruction support. | |
#define | Vc_IMPL_POPCNT |
This macro is defined if the current translation unit is compiled with POPCNT instruction support. | |
#define | Vc_IMPL_SSE4a |
This macro is defined if the current translation unit is compiled with SSE4a instruction support. | |
#define | Vc_IMPL_Scalar |
This macro is defined if the current translation unit is compiled without any SIMD support. | |
#define | Vc_IMPL_SSE |
This macro is defined if the current translation unit is compiled with any version of SSE (but not AVX). | |
#define | Vc_IMPL_SSE2 |
This macro is defined if the current translation unit is compiled with SSE2 instruction support (excluding SSE3 and up). | |
#define | Vc_IMPL_SSE3 |
This macro is defined if the current translation unit is compiled with SSE3 instruction support (excluding SSSE3 and up). | |
#define | Vc_IMPL_SSSE3 |
This macro is defined if the current translation unit is compiled with SSSE3 instruction support (excluding SSE4.1 and up). | |
#define | Vc_IMPL_SSE4_1 |
This macro is defined if the current translation unit is compiled with SSE4.1 instruction support (excluding SSE4.2 and up). | |
#define | Vc_IMPL_SSE4_2 |
This macro is defined if the current translation unit is compiled with SSE4.2 instruction support (excluding AVX and up). | |
#define | Vc_IMPL_AVX |
This macro is defined if the current translation unit is compiled with AVX instruction support (excluding AVX2 and up). | |
#define | Vc_IMPL_AVX2 |
This macro is defined if the current translation unit is compiled with AVX2 instruction support. | |
SIMD Vector Size Macros | |
#define | Vc_DOUBLE_V_SIZE |
An integer (for use with the preprocessor) that gives the number of entries in a double_v. | |
#define | Vc_FLOAT_V_SIZE |
An integer (for use with the preprocessor) that gives the number of entries in a float_v. | |
#define | Vc_INT_V_SIZE |
An integer (for use with the preprocessor) that gives the number of entries in a int_v. | |
#define | Vc_UINT_V_SIZE |
An integer (for use with the preprocessor) that gives the number of entries in a uint_v. | |
#define | Vc_SHORT_V_SIZE |
An integer (for use with the preprocessor) that gives the number of entries in a short_v. | |
#define | Vc_USHORT_V_SIZE |
An integer (for use with the preprocessor) that gives the number of entries in a ushort_v. | |
Compiler Identification Macros | |
#define | Vc_ICC __INTEL_COMPILER_BUILD_DATE |
This macro is defined to a number identifying the ICC version if the current translation unit is compiled with the Intel compiler. More... | |
#define | Vc_CLANG (__clang_major__ * 0x10000 + __clang_minor__ * 0x100 + __clang_patchlevel__) |
This macro is defined to a number identifying the Clang version if the current translation unit is compiled with the Clang compiler. More... | |
#define | Vc_APPLECLANG (__clang_major__ * 0x10000 + __clang_minor__ * 0x100 + __clang_patchlevel__) |
This macro is defined to a number identifying the Apple Clang version if the current translation unit is compiled with the Apple Clang compiler. More... | |
#define | Vc_GCC (__GNUC__ * 0x10000 + __GNUC_MINOR__ * 0x100 + __GNUC_PATCHLEVEL__) |
This macro is defined to a number identifying the GCC version if the current translation unit is compiled with the GCC compiler. More... | |
#define | Vc_MSVC _MSC_FULL_VER |
This macro is defined to a number identifying the Microsoft Visual C++ version if the current translation unit is compiled with the Visual C++ (MSVC) compiler. More... | |
Micro-Architecture Feature Tests | |
unsigned int | extraInstructionsSupported () |
Determines the extra instructions supported by the current CPU. More... | |
bool | isImplementationSupported (Vc::Implementation impl) |
Tests whether the given implementation is supported by the system the code is executing on. More... | |
Vc::Implementation | bestImplementationSupported () |
Determines the best supported implementation for the current system. More... | |
bool | currentImplementationSupported () |
Tests that the CPU and Operating System support the vector unit which was compiled for. More... | |
Version Macros | |
#define | Vc_VERSION_STRING "1.4.1" |
Contains the version string of the Vc headers. More... | |
#define | Vc_VERSION_NUMBER 0x010402 |
Contains the encoded version number of the Vc headers. More... | |
#define | Vc_VERSION_CHECK(major, minor, patch) ((major << 16) | (minor << 8) | (patch << 1)) |
Helper macro to compare against an encoded version number. More... | |
Boolean Reductions | |
template<typename Mask > | |
constexpr bool | all_of (const Mask &m) |
Returns whether all entries in the mask m are true . | |
constexpr bool | all_of (bool b) |
Returns b . | |
template<typename Mask > | |
constexpr bool | any_of (const Mask &m) |
Returns whether at least one entry in the mask m is true . | |
constexpr bool | any_of (bool b) |
Returns b . | |
template<typename Mask > | |
constexpr bool | none_of (const Mask &m) |
Returns whether all entries in the mask m are false . | |
constexpr bool | none_of (bool b) |
Returns !b . | |
template<typename Mask > | |
constexpr bool | some_of (const Mask &m) |
Returns whether at least one entry in m is true and at least one entry in m is false . | |
constexpr bool | some_of (bool) |
Returns false . | |
#define Vc_ICC __INTEL_COMPILER_BUILD_DATE |
#define Vc_CLANG (__clang_major__ * 0x10000 + __clang_minor__ * 0x100 + __clang_patchlevel__) |
#define Vc_APPLECLANG (__clang_major__ * 0x10000 + __clang_minor__ * 0x100 + __clang_patchlevel__) |
#define Vc_GCC (__GNUC__ * 0x10000 + __GNUC_MINOR__ * 0x100 + __GNUC_PATCHLEVEL__) |
#define Vc_MSVC _MSC_FULL_VER |
#define Vc_VERSION_STRING "1.4.1" |
Contains the version string of the Vc headers.
Same as Vc::versionString().
Definition at line 40 of file version.h.
Referenced by Vc::versionNumber(), and Vc::versionString().
#define Vc_VERSION_NUMBER 0x010402 |
Contains the encoded version number of the Vc headers.
Same as Vc::versionNumber().
Definition at line 46 of file version.h.
Referenced by Vc::versionNumber().
#define Vc_VERSION_CHECK | ( | major, | |
minor, | |||
patch | |||
) | ((major << 16) | (minor << 8) | (patch << 1)) |
#define Vc_DECLARE_ALLOCATOR | ( | Type | ) |
Convenience macro to set the default allocator for a given Type
to Vc::Allocator.
Type | Your type that you want to use with STL containers. |
using CurrentImplementation = ImplementationT< > |
Identifies the Vc implementation used in the current translation unit.
using VectorAlignedBase = AlignedBase< Detail::max(alignof(Vector<float>), alignof(Vector<double>), alignof(Vector<ullong>), alignof(Vector<llong>), alignof(Vector<ulong>), alignof(Vector<long>), alignof(Vector<uint>), alignof(Vector<int>), alignof(Vector<ushort>), alignof(Vector<short>), alignof(Vector<uchar>), alignof(Vector<schar>))> |
Helper type to ensure suitable alignment for any Vc::Vector<T> type (using the default VectorAbi).
This class reimplements the new
and delete
operators to align objects allocated on the heap suitably for objects of Vc::Vector<T> type. This is necessary since the standard new
operator does not adhere to the alignment requirements of the type.
Definition at line 90 of file alignedbase.h.
using VectorAlignedBaseT = AlignedBase<alignof(V)> |
Variant of the above type ensuring suitable alignment only for the specified vector type V
.
Definition at line 100 of file alignedbase.h.
using MemoryAlignedBase = AlignedBase< Detail::max(Vector<float>::MemoryAlignment, Vector<double>::MemoryAlignment, Vector<ullong>::MemoryAlignment, Vector<llong>::MemoryAlignment, Vector<ulong>::MemoryAlignment, Vector<long>::MemoryAlignment, Vector<uint>::MemoryAlignment, Vector<int>::MemoryAlignment, Vector<ushort>::MemoryAlignment, Vector<short>::MemoryAlignment, Vector<uchar>::MemoryAlignment, Vector<schar>::MemoryAlignment)> |
Helper class to ensure suitable alignment for arrays of scalar objects for any Vc::Vector<T> type (using the default VectorAbi).
This class reimplements the new
and delete
operators to align objects allocated on the heap suitably for arrays of type Vc::Vector<T>::EntryType
. Subsequent load and store operations are safe to use the aligned variant.
Definition at line 122 of file alignedbase.h.
using MemoryAlignedBaseT = AlignedBase<V::MemoryAlignment> |
Variant of the above type ensuring suitable alignment only for the specified vector type V
.
Definition at line 132 of file alignedbase.h.
enum MallocAlignment |
Enum that specifies the alignment and padding restrictions to use for memory allocation with Vc::malloc.
enum Implementation : std::uint_least32_t |
Enum to identify a certain SIMD instruction set.
You can use CurrentImplementation for the currently active implementation.
enum ExtraInstructions : std::uint_least32_t |
The list of available instructions is not easily described by a linear list of instruction sets.
On x86 the following instruction sets always include their predecessors: SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2
But there are additional instructions that are not necessarily required by this list. These are covered in this enum.
unsigned int Vc::extraInstructionsSupported | ( | ) |
Determines the extra instructions supported by the current CPU.
bool Vc::isImplementationSupported | ( | Vc::Implementation | impl | ) |
Tests whether the given implementation is supported by the system the code is executing on.
true
if the OS and hardware support execution of instructions defined by impl
. false
otherwiseimpl | The SIMD target to test for. |
Vc::Implementation Vc::bestImplementationSupported | ( | ) |
Determines the best supported implementation for the current system.
|
inline |
Tests that the CPU and Operating System support the vector unit which was compiled for.
This function should be called before any other Vc functionality is used. It checks whether the program will work. If this function returns false
then the program should exit with a useful error message before the OS has to kill it because of an invalid instruction exception.
If the program continues and makes use of any vector features not supported by hard- or software then the program will crash.
Example:
true
if the OS and hardware support execution of the currently selected SIMD instructions. false
otherwise
|
inline |
Definition at line 81 of file version.h.
Referenced by Vc::versionNumber().
constexpr unsigned int Vc::versionNumber | ( | ) |
|
inline |
Prints the contents of a Memory object into a stream object.
will output (with SSE):
{[0, 1, 2, 3] [4, 5, 6, 7] [8, 9, 0, 0]}
s | Any standard C++ ostream object. For example std::cout or a std::stringstream object. |
m | Any Vc::Memory object. |
UnaryFunction Vc::simd_for_each | ( | InputIt | first, |
InputIt | last, | ||
UnaryFunction | f | ||
) |
Vc variant of the std::for_each
algorithm.
This algorithm calls f
with one argument of type Vc::Vector<
iterator value type ,
unspecified >
as often as is needed to iterate over the complete range from first
to last
. It will try to use the best vector size (VectorAbi) to work on the largest chunks possible. To support aligned loads (and stores) and to support arbitrary range distances, the algorithm may require the use of Vc::VectorAbi
types that work on fewer elements in parallel.
The following example requires C++14 for generic lambdas. If you don't have generic lambdas available you can use a "classic" functor type with a templated call operator instead.
|
inlinedelete |
Function to mimic the ternary operator '?:' (inline-if).
condition | Determines which values are returned. This is analog to the first argument to the ternary operator. |
trueValue | The values to return where condition is true . |
falseValue | The values to return where condition is false . |
trueValue
and falseValue
, according to condition
.So instead of the scalar variant
you'd write
Assuming a
has the values [0, 3, 5, 1], b
is [1, 1, 1, 1], and c
is [1, 2, 3, 4], then x will be [2, 2, 3, 5].
constexpr T Vc::iif | ( | bool | condition, |
const T & | trueValue, | ||
const T & | falseValue | ||
) |
Overload of the above for boolean conditions.
This typically results in direct use of the ternary operator. This function makes it easier to switch from a Vc type to a builtin type.
condition | Determines which value is returned. This is analog to the first argument to the ternary operator. |
trueValue | The value to return if condition is true . |
falseValue | The value to return if condition is false . |
trueValue
or falseValue
, depending on condition
. Definition at line 90 of file iif.h.
Referenced by Vc::iif().
std::pair<V, V> Vc::interleave | ( | const V & | a, |
const V & | b | ||
) |
Interleaves the entries from a
and b
into two vectors of the same type.
The order in the returned vector contains the elements a[0], b[0], a[1], b[1], a[2], b[2], a[3], b[3], ...
.
Example:
a | input vector whose data will appear at even indexes in the output |
b | input vector whose data will appear at odd indexes in the output |
a
and b
interleaved Definition at line 55 of file interleave.h.
|
inline |
Allocates memory on the Heap with alignment and padding suitable for vectorized access.
Memory that was allocated with this function must be released with Vc::free! Other methods might work but are not portable.
n | Specifies the number of objects the allocated memory must be able to store. |
T | The type of the allocated memory. Note, that the constructor is not called. |
A | Determines the alignment of the memory. See Vc::MallocAlignment. |
A
. Thus if you request memory for 21 int objects, aligned via Vc::AlignOnCacheline, you can safely read a full cacheline until the end of the array, without generating an out-of-bounds access. For a cacheline size of 64 Bytes and an int size of 4 Bytes you would thus get an array of 128 Bytes to work with.
|
inline |
Frees memory that was allocated with Vc::malloc.
p | The pointer to the memory to be freed. |
T | The type of the allocated memory. |
Definition at line 163 of file malloc.h.
Referenced by Memory< V, 0u, 0u, true >::~Memory().
|
inline |
Prefetch the cacheline containing addr
for a single read access.
This prefetch completely bypasses the cache, not evicting any other data.
addr | The cacheline containing addr will be prefetched. |
Definition at line 539 of file memory.h.
Referenced by Vc::Common::prefetchFar().
|
inline |
Prefetch the cacheline containing addr
for modification.
This prefetch evicts data from the cache. So use it only for data you really will use. When the target system supports it the cacheline will be marked as modified while prefetching, saving work later on.
addr | The cacheline containing addr will be prefetched. |
Definition at line 556 of file memory.h.
Referenced by Vc::Common::prefetchFar().
|
inline |
Prefetch the cacheline containing addr
to L1 cache.
This prefetch evicts data from the cache. So use it only for data you really will use.
addr | The cacheline containing addr will be prefetched. |
Definition at line 571 of file memory.h.
Referenced by Vc::Common::prefetchFar().
|
inline |
Prefetch the cacheline containing addr
to L2 cache.
This prefetch evicts data from the cache. So use it only for data you really will use.
addr | The cacheline containing addr will be prefetched. |
Definition at line 586 of file memory.h.
Referenced by Vc::Common::prefetchFar().
|
inline |
|
inline |
Constructs a new Vector object of type V
from the Vector x
, reinterpreting the bits of x
for the new type V
.
This function is only applicable if:
sizeof
of the input and output types is equalVectorEntryTypes
of input and output have equal sizeof
V | The requested type to change x into. |
x | The Vector to reinterpret as an object of type V . |
V
.Abi
, though. constexpr WhereImpl::WhereMask<M> Vc::where | ( | const M & | mask | ) |
Conditional assignment.
Since compares between SIMD vectors do not return a single boolean, but rather a vector of booleans (mask), one often cannot use if / else statements. Instead, one needs to state that only a subset of entries of a given SIMD vector should be modified. The where
function can be prepended to any assignment operation to execute a masked assignment.
mask | The mask that selects the entries in the target vector that will be modified. |
where(mask) | x = y
or where(mask)(x) = y
)Example:
The block following the if statement in f1
will be executed if x < 2
evaluates to true
. If T
is a scalar type you normally get what you expect. But if T
is a SIMD vector type, the comparison will use the implicit conversion from a mask to bool, meaning all_of(x < 2)
.
Most of the time the required operation is a masked assignment as stated in f2
.
Definition at line 265 of file where.h.
Referenced by Vc::iif().
constexpr AlignedTag Aligned |
Use this object for a flags
parameter to request aligned loads and stores.
It specifies that a load/store can expect a memory address that is aligned on the correct boundary. (i.e. MemoryAlignment
)
Definition at line 178 of file loadstoreflags.h.
Referenced by Vc::deinterleave(), and SimdArray< T, N >::rotated().
constexpr UnalignedTag Unaligned |
Use this object for a flags
parameter to request unaligned loads and stores.
It specifies that a load/store can not expect a memory address that is aligned on the correct boundary. (i.e. alignment is less than MemoryAlignment
)
Definition at line 191 of file loadstoreflags.h.
Referenced by MemoryBase< V, Memory< V, Size, 0u, InitPadding >, 1, void >::lastVector(), SimdArray< T, N >::rotated(), SimdArray< T, N >::SimdArray(), and MemoryBase< V, Memory< V, Size, 0u, InitPadding >, 1, void >::vector().
constexpr StreamingTag Streaming |
Use this object for a flags
parameter to request streaming loads and stores.
It specifies that the cache should be bypassed for the given load/store. Whether this will actually be done depends on the target system's capabilities.
Streaming stores can be interesting when the code calculates values that, after being written to memory, will not be used for a long time or used by a different thread.
Definition at line 206 of file loadstoreflags.h.