Vc 1.4.5
SIMD Vector Classes for C++
 
Loading...
Searching...
No Matches
Utilities

Detailed Description

Additional classes, macros, and functions that help to work more easily with the main vector types.

Classes

class  CpuId
 This class is available for x86 / AMD64 systems to read and interpret information about the CPU's capabilities. More...
 
struct  ImplementationT< Features >
 This class identifies the specific implementation Vc uses in the current translation unit in terms of a type. More...
 
class  Allocator< T >
 An allocator that uses global new and supports over-aligned types, as per [C++11 20.6.9]. More...
 
struct  AlignedBase< Alignment >
 Helper class to ensure a given alignment. More...
 

Macros

#define Vc_DECLARE_ALLOCATOR(Type)
 Convenience macro to set the default allocator for a given Type to Vc::Allocator.
 

Typedefs

using CurrentImplementation
 Identifies the Vc implementation used in the current translation unit.
 
using VectorAlignedBase
 Helper type to ensure suitable alignment for any Vc::Vector<T> type (using the default VectorAbi).
 
template<typename V >
using VectorAlignedBaseT = AlignedBase<alignof(V)>
 Variant of the above type ensuring suitable alignment only for the specified vector type V.
 
using MemoryAlignedBase
 Helper class to ensure suitable alignment for arrays of scalar objects for any Vc::Vector<T> type (using the default VectorAbi).
 
template<typename V >
using MemoryAlignedBaseT = AlignedBase<V::MemoryAlignment>
 Variant of the above type ensuring suitable alignment only for the specified vector type V.
 
using llong = long long
 long long shorthand
 
using ullong = unsigned long long
 unsigned long long shorthand
 
using ulong = unsigned long
 unsigned long shorthand
 
using uint = unsigned int
 unsigned int shorthand
 
using ushort = unsigned short
 unsigned short shorthand
 
using uchar = unsigned char
 unsigned char shorthand
 
using schar = signed char
 signed char shorthand
 

Enumerations

enum  MallocAlignment { AlignOnVector , AlignOnCacheline , AlignOnPage }
 Enum that specifies the alignment and padding restrictions to use for memory allocation with Vc::malloc. More...
 
enum  Implementation : std::uint_least32_t {
  ScalarImpl , SSE2Impl , SSE3Impl , SSSE3Impl ,
  SSE41Impl , SSE42Impl , AVXImpl , AVX2Impl ,
  MICImpl , ImplementationMask = 0xfff
}
 Enum to identify a certain SIMD instruction set. More...
 
enum  ExtraInstructions : std::uint_least32_t {
  Float16cInstructions = 0x01000 , Fma4Instructions = 0x02000 , XopInstructions = 0x04000 , PopcntInstructions = 0x08000 ,
  Sse4aInstructions = 0x10000 , FmaInstructions = 0x20000 , VexInstructions = 0x40000 , Bmi2Instructions = 0x80000 ,
  ExtraInstructionsMask = 0xfffff000u
}
 The list of available instructions is not easily described by a linear list of instruction sets. More...
 

Functions

const char * versionString ()
 
constexpr unsigned int versionNumber ()
 
template<typename V , typename Parent , typename Dimension , typename RM >
std::ostream & operator<< (std::ostream &s, const Vc::MemoryBase< V, Parent, Dimension, RM > &m)
 Prints the contents of a Memory object into a stream object.
 
template<class InputIt , class UnaryFunction >
UnaryFunction simd_for_each (InputIt first, InputIt last, UnaryFunction f)
 Vc variant of the std::for_each algorithm.
 
template<typename Mask , typename T >
enable_if< is_simd_mask< Mask >::value &&is_simd_vector< T >::value, T > iif (const Mask &condition, const T &trueValue, const T &falseValue)
 Function to mimic the ternary operator '?:' (inline-if).
 
template<typename T >
constexpr T iif (bool condition, const T &trueValue, const T &falseValue)
 Overload of the above for boolean conditions.
 
template<typename V , typename = enable_if<Traits::is_simd_vector<V>::value>>
std::pair< V, V > interleave (const V &a, const V &b)
 Interleaves the entries from a and b into two vectors of the same type.
 
template<typename T , Vc::MallocAlignment A>
T * malloc (size_t n)
 Allocates memory on the Heap with alignment and padding suitable for vectorized access.
 
template<typename T >
void free (T *p)
 Frees memory that was allocated with Vc::malloc.
 
void prefetchForOneRead (const void *addr)
 Prefetch the cacheline containing addr for a single read access.
 
void prefetchForModify (const void *addr)
 Prefetch the cacheline containing addr for modification.
 
void prefetchClose (const void *addr)
 Prefetch the cacheline containing addr to L1 cache.
 
void prefetchMid (const void *addr)
 Prefetch the cacheline containing addr to L2 cache.
 
void prefetchFar (const void *addr)
 Prefetch the cacheline containing addr to L3 cache.
 
template<typename V , typename T , typename Abi >
enable_if<(V::size()==Vector< T, Abi >::size() &&sizeof(typename V::VectorEntryType)==sizeof(typename Vector< T, Abi >::VectorEntryType) &&sizeof(V)==sizeof(Vector< T, Abi >) &&alignof(V)<=alignof(Vector< T, Abi >)), V > reinterpret_components_cast (const Vector< T, Abi > &x)
 Constructs a new Vector object of type V from the Vector x, reinterpreting the bits of x for the new type V.
 
template<typename M >
constexpr WhereImpl::WhereMask< M > where (const M &mask)
 Conditional assignment.
 

Variables

constexpr AlignedTag Aligned
 Use this object for a flags parameter to request aligned loads and stores.
 
constexpr UnalignedTag Unaligned
 Use this object for a flags parameter to request unaligned loads and stores.
 
constexpr StreamingTag Streaming
 Use this object for a flags parameter to request streaming loads and stores.
 
constexpr LoadStoreFlags::LoadStoreFlags< PrefetchFlag<> > PrefetchDefault
 Use this object for a flags parameter to request default software prefetches to be emitted.
 
constexpr VectorSpecialInitializerZero Zero = {}
 The special object Vc::Zero can be used to construct Vector and Mask objects initialized to zero/false.
 
constexpr VectorSpecialInitializerOne One = {}
 The special object Vc::One can be used to construct Vector and Mask objects initialized to one/true.
 
constexpr VectorSpecialInitializerIndexesFromZero IndexesFromZero = {}
 The special object Vc::IndexesFromZero can be used to construct Vector objects initialized to values 0, 1, 2, 3, 4, ...
 

Micro-Architecture Feature Tests

unsigned int extraInstructionsSupported ()
 Determines the extra instructions supported by the current CPU.
 
bool isImplementationSupported (Vc::Implementation impl)
 Tests whether the given implementation is supported by the system the code is executing on.
 
Vc::Implementation bestImplementationSupported ()
 Determines the best supported implementation for the current system.
 
bool currentImplementationSupported ()
 Tests that the CPU and Operating System support the vector unit which was compiled for.
 

Boolean Reductions

template<typename Mask >
constexpr bool all_of (const Mask &m)
 Returns whether all entries in the mask m are true.
 
constexpr bool all_of (bool b)
 Returns b.
 
template<typename Mask >
constexpr bool any_of (const Mask &m)
 Returns whether at least one entry in the mask m is true.
 
constexpr bool any_of (bool b)
 Returns b.
 
template<typename Mask >
constexpr bool none_of (const Mask &m)
 Returns whether all entries in the mask m are false.
 
constexpr bool none_of (bool b)
 Returns !b.
 
template<typename Mask >
constexpr bool some_of (const Mask &m)
 Returns whether at least one entry in m is true and at least one entry in m is false.
 
constexpr bool some_of (bool)
 Returns false.
 

SIMD Vector Size Macros

#define Vc_DOUBLE_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a double_v.
 
#define Vc_FLOAT_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a float_v.
 
#define Vc_INT_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a int_v.
 
#define Vc_UINT_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a uint_v.
 
#define Vc_SHORT_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a short_v.
 
#define Vc_USHORT_V_SIZE
 An integer (for use with the preprocessor) that gives the number of entries in a ushort_v.
 

Compiler Identification Macros

#define Vc_ICC   __INTEL_COMPILER_BUILD_DATE
 This macro is defined to a number identifying the ICC version if the current translation unit is compiled with the Intel compiler.
 
#define Vc_CLANG   (__clang_major__ * 0x10000 + __clang_minor__ * 0x100 + __clang_patchlevel__)
 This macro is defined to a number identifying the Clang version if the current translation unit is compiled with the Clang compiler.
 
#define Vc_APPLECLANG   (__clang_major__ * 0x10000 + __clang_minor__ * 0x100 + __clang_patchlevel__)
 This macro is defined to a number identifying the Apple Clang version if the current translation unit is compiled with the Apple Clang compiler.
 
#define Vc_GCC   (__GNUC__ * 0x10000 + __GNUC_MINOR__ * 0x100 + __GNUC_PATCHLEVEL__)
 This macro is defined to a number identifying the GCC version if the current translation unit is compiled with the GCC compiler.
 
#define Vc_MSVC   _MSC_FULL_VER
 This macro is defined to a number identifying the Microsoft Visual C++ version if the current translation unit is compiled with the Visual C++ (MSVC) compiler.
 

Version Macros

#define Vc_VERSION_STRING   "1.4.5"
 Contains the version string of the Vc headers.
 
#define Vc_VERSION_NUMBER   0x01040a
 Contains the encoded version number of the Vc headers.
 
#define Vc_VERSION_CHECK(major, minor, patch)   ((major << 16) | (minor << 8) | (patch << 1))
 Helper macro to compare against an encoded version number.
 

Macro Definition Documentation

◆ Vc_ICC

#define Vc_ICC   __INTEL_COMPILER_BUILD_DATE

This macro is defined to a number identifying the ICC version if the current translation unit is compiled with the Intel compiler.

For any other compiler this macro is not defined.

Definition at line 48 of file global.h.

◆ Vc_CLANG

#define Vc_CLANG   (__clang_major__ * 0x10000 + __clang_minor__ * 0x100 + __clang_patchlevel__)

This macro is defined to a number identifying the Clang version if the current translation unit is compiled with the Clang compiler.

For any other compiler this macro is not defined.

Definition at line 57 of file global.h.

◆ Vc_APPLECLANG

#define Vc_APPLECLANG   (__clang_major__ * 0x10000 + __clang_minor__ * 0x100 + __clang_patchlevel__)

This macro is defined to a number identifying the Apple Clang version if the current translation unit is compiled with the Apple Clang compiler.

For any other compiler this macro is not defined.

Definition at line 66 of file global.h.

◆ Vc_GCC

#define Vc_GCC   (__GNUC__ * 0x10000 + __GNUC_MINOR__ * 0x100 + __GNUC_PATCHLEVEL__)

This macro is defined to a number identifying the GCC version if the current translation unit is compiled with the GCC compiler.

For any other compiler this macro is not defined.

Definition at line 75 of file global.h.

◆ Vc_MSVC

#define Vc_MSVC   _MSC_FULL_VER

This macro is defined to a number identifying the Microsoft Visual C++ version if the current translation unit is compiled with the Visual C++ (MSVC) compiler.

For any other compiler this macro is not defined.

Definition at line 83 of file global.h.

◆ Vc_VERSION_STRING

#define Vc_VERSION_STRING   "1.4.5"

Contains the version string of the Vc headers.

Same as Vc::versionString().

Definition at line 40 of file version.h.

Referenced by Vc::versionString().

◆ Vc_VERSION_NUMBER

#define Vc_VERSION_NUMBER   0x01040a

Contains the encoded version number of the Vc headers.

Same as Vc::versionNumber().

Definition at line 46 of file version.h.

Referenced by Vc::versionNumber().

◆ Vc_VERSION_CHECK

#define Vc_VERSION_CHECK ( major,
minor,
patch )   ((major << 16) | (minor << 8) | (patch << 1))

Helper macro to compare against an encoded version number.

Example:

#if Vc_VERSION_NUMBER >= Vc_VERSION_CHECK(1, 0, 0)

Definition at line 57 of file version.h.

◆ Vc_DECLARE_ALLOCATOR

#define Vc_DECLARE_ALLOCATOR ( Type)
Value:
namespace std \
{ \
template <> class allocator<Type> : public ::Vc::Allocator<Type> \
{ \
public: \
template <typename U> struct rebind { \
typedef ::std::allocator<U> other; \
}; \
}; \
}
An allocator that uses global new and supports over-aligned types, as per [C++11 20....
Definition Allocator:129

Convenience macro to set the default allocator for a given Type to Vc::Allocator.

Parameters
TypeYour type that you want to use with STL containers.
Note
You have to use this macro in the global namespace.

Definition at line 65 of file Allocator.

Typedef Documentation

◆ CurrentImplementation

Identifies the Vc implementation used in the current translation unit.

See also
ImplementationT

Definition at line 581 of file global.h.

◆ VectorAlignedBase

Initial value:
AlignedBase<
Detail::max(alignof(Vector<float>), alignof(Vector<double>), alignof(Vector<ullong>),
alignof(Vector<llong>), alignof(Vector<ulong>), alignof(Vector<long>),
alignof(Vector<uint>), alignof(Vector<int>), alignof(Vector<ushort>),
alignof(Vector<short>), alignof(Vector<uchar>), alignof(Vector<schar>))>

Helper type to ensure suitable alignment for any Vc::Vector<T> type (using the default VectorAbi).

This class reimplements the new and delete operators to align objects allocated on the heap suitably for objects of Vc::Vector<T> type. This is necessary since the standard new operator does not adhere to the alignment requirements of the type.

See also
Vc::VectorAlignedBaseT
Vc::MemoryAlignedBase
Vc::AlignedBase

Definition at line 86 of file alignedbase.h.

◆ VectorAlignedBaseT

template<typename V >
using VectorAlignedBaseT = AlignedBase<alignof(V)>

Variant of the above type ensuring suitable alignment only for the specified vector type V.

See also
Vc::VectorAlignedBase
Vc::MemoryAlignedBaseT

Definition at line 100 of file alignedbase.h.

◆ MemoryAlignedBase

Initial value:
AlignedBase<
Detail::max(Vector<float>::MemoryAlignment, Vector<double>::MemoryAlignment,
Vector<ullong>::MemoryAlignment, Vector<llong>::MemoryAlignment,
Vector<ulong>::MemoryAlignment, Vector<long>::MemoryAlignment,
Vector<uint>::MemoryAlignment, Vector<int>::MemoryAlignment,
Vector<ushort>::MemoryAlignment, Vector<short>::MemoryAlignment,
Vector<uchar>::MemoryAlignment, Vector<schar>::MemoryAlignment)>

Helper class to ensure suitable alignment for arrays of scalar objects for any Vc::Vector<T> type (using the default VectorAbi).

This class reimplements the new and delete operators to align objects allocated on the heap suitably for arrays of type Vc::Vector<T>::EntryType. Subsequent load and store operations are safe to use the aligned variant.

See also
Vc::MemoryAlignedBaseT
Vc::VectorAlignedBase
Vc::AlignedBase

Definition at line 116 of file alignedbase.h.

◆ MemoryAlignedBaseT

template<typename V >
using MemoryAlignedBaseT = AlignedBase<V::MemoryAlignment>

Variant of the above type ensuring suitable alignment only for the specified vector type V.

See also
Vc::MemoryAlignedBase
Vc::VectorAlignedBaseT

Definition at line 132 of file alignedbase.h.

Enumeration Type Documentation

◆ MallocAlignment

Enum that specifies the alignment and padding restrictions to use for memory allocation with Vc::malloc.

Enumerator
AlignOnVector 

Align on boundary of vector sizes (e.g.

16 Bytes on SSE platforms) and pad to allow vector access to the end. Thus the allocated memory contains a multiple of VectorAlignment bytes.

AlignOnCacheline 

Align on boundary of cache line sizes (e.g.

64 Bytes on x86) and pad to allow full cache line access to the end. Thus the allocated memory contains a multiple of 64 bytes.

AlignOnPage 

Align on boundary of page sizes (e.g.

4096 Bytes on x86) and pad to allow full page access to the end. Thus the allocated memory contains a multiple of 4096 bytes.

Definition at line 452 of file global.h.

◆ Implementation

enum Implementation : std::uint_least32_t

Enum to identify a certain SIMD instruction set.

You can use CurrentImplementation for the currently active implementation.

See also
ExtraInstructions
Enumerator
ScalarImpl 

uses only fundamental types

SSE2Impl 

x86 SSE + SSE2

SSE3Impl 

x86 SSE + SSE2 + SSE3

SSSE3Impl 

x86 SSE + SSE2 + SSE3 + SSSE3

SSE41Impl 

x86 SSE + SSE2 + SSE3 + SSSE3 + SSE4.1

SSE42Impl 

x86 SSE + SSE2 + SSE3 + SSSE3 + SSE4.1 + SSE4.2

AVXImpl 

x86 AVX

AVX2Impl 

x86 AVX + AVX2

MICImpl 

Intel Xeon Phi.

Definition at line 482 of file global.h.

◆ ExtraInstructions

enum ExtraInstructions : std::uint_least32_t

The list of available instructions is not easily described by a linear list of instruction sets.

On x86 the following instruction sets always include their predecessors: SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, AVX2

But there are additional instructions that are not necessarily required by this list. These are covered in this enum.

Enumerator
Float16cInstructions 

Support for float16 conversions in hardware.

Fma4Instructions 

Support for FMA4 instructions.

XopInstructions 

Support for XOP instructions.

PopcntInstructions 

Support for the population count instruction.

Sse4aInstructions 

Support for SSE4a instructions.

FmaInstructions 

Support for FMA instructions (3 operand variant)

VexInstructions 

Support for ternary instruction coding (VEX)

Bmi2Instructions 

Support for BMI2 instructions.

Definition at line 514 of file global.h.

Function Documentation

◆ extraInstructionsSupported()

unsigned int extraInstructionsSupported ( )

Determines the extra instructions supported by the current CPU.

Returns
A combination of flags from Vc::ExtraInstructions that the current CPU supports.

Referenced by Vc::extraInstructionsSupported(), and Vc::isImplementationSupported().

◆ isImplementationSupported()

bool isImplementationSupported ( Vc::Implementation impl)

Tests whether the given implementation is supported by the system the code is executing on.

Returns
true if the OS and hardware support execution of instructions defined by impl.
false otherwise
Parameters
implThe SIMD target to test for.

Referenced by Vc::isImplementationSupported().

◆ bestImplementationSupported()

Vc::Implementation bestImplementationSupported ( )

Determines the best supported implementation for the current system.

Returns
The enum value for the best implementation.

Referenced by Vc::bestImplementationSupported().

◆ currentImplementationSupported()

bool currentImplementationSupported ( )
inline

Tests that the CPU and Operating System support the vector unit which was compiled for.

This function should be called before any other Vc functionality is used. It checks whether the program will work. If this function returns false then the program should exit with a useful error message before the OS has to kill it because of an invalid instruction exception.

If the program continues and makes use of any vector features not supported by hard- or software then the program will crash.

Example:

int main()
{
std::cerr << "CPU or OS requirements not met for the compiled in vector unit!\n";
exit -1;
}
...
}
bool currentImplementationSupported()
Tests that the CPU and Operating System support the vector unit which was compiled for.
Definition support.h:148
Returns
true if the OS and hardware support execution of the currently selected SIMD instructions.
false otherwise

Definition at line 148 of file support.h.

Referenced by Vc::currentImplementationSupported().

◆ versionString()

const char * versionString ( )
inline
Returns
the version string of the Vc headers.
Note
There exists a built-in check that ensures on application startup that the Vc version of the library (link time) and the headers (compile time) are equal. A mismatch between headers and library could lead to errors that are very hard to debug.
If you need to disable the check (it costs a very small amount of application startup time) you can define Vc_NO_VERSION_CHECK at compile time.

Definition at line 81 of file version.h.

Referenced by Vc::versionString().

◆ versionNumber()

unsigned int versionNumber ( )
constexpr
Returns
the version of the Vc headers encoded in an integer.

Definition at line 89 of file version.h.

Referenced by Vc::versionNumber().

◆ operator<<()

template<typename V , typename Parent , typename Dimension , typename RM >
std::ostream & operator<< ( std::ostream & s,
const Vc::MemoryBase< V, Parent, Dimension, RM > & m )
inline

Prints the contents of a Memory object into a stream object.

for (int i = 0; i < m.entriesCount(); ++i) {
m[i] = i;
}
std::cout << m << std::endl;
A helper class for fixed-size two-dimensional arrays.
Definition memoryfwd.h:37
static constexpr size_t entriesCount()
Definition memory.h:111

will output (with SSE):

{[0, 1, 2, 3] [4, 5, 6, 7] [8, 9, 0, 0]}
Parameters
sAny standard C++ ostream object. For example std::cout or a std::stringstream object.
mAny Vc::Memory object.
Returns
The ostream object: to chain multiple stream operations.
Note
With the GNU standard library this function will check whether the output stream is a tty in which case it colorizes the output.
Warning
Please do not forget that printing a large memory object can take a long time.

◆ simd_for_each()

template<class InputIt , class UnaryFunction >
UnaryFunction simd_for_each ( InputIt first,
InputIt last,
UnaryFunction f )

Vc variant of the std::for_each algorithm.

This algorithm calls f with one argument of type Vc::Vector< iterator value type , unspecified > as often as is needed to iterate over the complete range from first to last. It will try to use the best vector size (VectorAbi) to work on the largest chunks possible. To support aligned loads (and stores) and to support arbitrary range distances, the algorithm may require the use of Vc::VectorAbi types that work on fewer elements in parallel.

The following example requires C++14 for generic lambdas. If you don't have generic lambdas available you can use a "classic" functor type with a templated call operator instead.

void scale(std::vector<double> &data, double factor) {
Vc::simd_for_each(data.begin(), data.end(), [&](auto v) {
v *= factor;
});
}
UnaryFunction simd_for_each(InputIt first, InputIt last, UnaryFunction f)
Vc variant of the std::for_each algorithm.

Referenced by Vc::simd_for_each().

◆ iif() [1/2]

template<typename Mask , typename T >
enable_if< is_simd_mask< Mask >::value &&is_simd_vector< T >::value, T > iif ( const Mask & condition,
const T & trueValue,
const T & falseValue )
inlinedelete

Function to mimic the ternary operator '?:' (inline-if).

Parameters
conditionDetermines which values are returned. This is analog to the first argument to the ternary operator.
trueValueThe values to return where condition is true.
falseValueThe values to return where condition is false.
Returns
A combination of entries from trueValue and falseValue, according to condition.

So instead of the scalar variant

float x = a > 1.f ? b : b + c;

you'd write

float_v x = Vc::iif (a > 1.f, b, b + c);
The main vector class for expressing data parallelism.
Definition vector.h:126
enable_if< is_simd_mask< Mask >::value &&is_simd_vector< T >::value, T > iif(const Mask &condition, const T &trueValue, const T &falseValue)
Function to mimic the ternary operator '?:' (inline-if).
Definition iif.h:60

Assuming a has the values [0, 3, 5, 1], b is [1, 1, 1, 1], and c is [1, 2, 3, 4], then x will be [2, 2, 3, 5].

Definition at line 60 of file iif.h.

Referenced by Vc::iif(), and Vc::iif().

◆ iif() [2/2]

template<typename T >
T iif ( bool condition,
const T & trueValue,
const T & falseValue )
constexpr

Overload of the above for boolean conditions.

This typically results in direct use of the ternary operator. This function makes it easier to switch from a Vc type to a builtin type.

Parameters
conditionDetermines which value is returned. This is analog to the first argument to the ternary operator.
trueValueThe value to return if condition is true.
falseValueThe value to return if condition is false.
Returns
Either trueValue or falseValue, depending on condition.

Definition at line 90 of file iif.h.

◆ interleave()

template<typename V , typename = enable_if<Traits::is_simd_vector<V>::value>>
std::pair< V, V > interleave ( const V & a,
const V & b )

Interleaves the entries from a and b into two vectors of the same type.

The order in the returned vector contains the elements a[0], b[0], a[1], b[1], a[2], b[2], a[3], b[3], ....

Example:

Vc::SimdArray<int, 4> a = { 1, 2, 3, 4 };
Vc::SimdArray<int, 4> b = { 9, 8, 7, 6 };
std::tie(a, b) = Vc::interleave(a, b);
std::cout << a << b;
// prints:
// <1 9 2 8><3 7 4 6>
Data-parallel arithmetic type with user-defined number of elements.
Definition simdarray.h:617
std::pair< V, V > interleave(const V &a, const V &b)
Interleaves the entries from a and b into two vectors of the same type.
Definition interleave.h:55
Parameters
ainput vector whose data will appear at even indexes in the output
binput vector whose data will appear at odd indexes in the output
Returns
two vectors with data from a and b interleaved

Definition at line 55 of file interleave.h.

Referenced by Vc::interleave().

◆ malloc()

template<typename T , Vc::MallocAlignment A>
T * malloc ( size_t n)
inline

Allocates memory on the Heap with alignment and padding suitable for vectorized access.

Memory that was allocated with this function must be released with Vc::free! Other methods might work but are not portable.

Parameters
nSpecifies the number of objects the allocated memory must be able to store.
Template Parameters
TThe type of the allocated memory. Note, that the constructor is not called.
ADetermines the alignment of the memory. See Vc::MallocAlignment.
Returns
Pointer to memory of the requested type, or 0 on error. The allocated memory is padded at the end to be a multiple of the requested alignment A. Thus if you request memory for 21 int objects, aligned via Vc::AlignOnCacheline, you can safely read a full cacheline until the end of the array, without generating an out-of-bounds access. For a cacheline size of 64 Bytes and an int size of 4 Bytes you would thus get an array of 128 Bytes to work with.
Warning
  • The standard malloc function specifies the number of Bytes to allocate whereas this function specifies the number of values, thus differing in a factor of sizeof(T).
  • This function is mainly meant for use with builtin types. If you use a custom type with a sizeof that is not a multiple of 2 the results might not be what you expect.
  • The constructor of T is not called. You can make up for this:
    SomeType *array = new(Vc::malloc<SomeType, Vc::AlignOnCacheline>(N)) SomeType[N];
    T * malloc(size_t n)
    Allocates memory on the Heap with alignment and padding suitable for vectorized access.
    Definition malloc.h:136
    This is std::array with additional subscript operators supporting gather and scatter operations.
    Definition types.h:188
See also
Vc::free

Definition at line 136 of file malloc.h.

◆ free()

template<typename T >
void free ( T * p)
inline

Frees memory that was allocated with Vc::malloc.

Parameters
pThe pointer to the memory to be freed.
Template Parameters
TThe type of the allocated memory.
Warning
The destructor of T is not called. If needed, you can call the destructor before calling free:
for (int i = 0; i < N; ++i) {
p[i].~T();
}
void free(T *p)
Frees memory that was allocated with Vc::malloc.
Definition malloc.h:163
See also
Vc::malloc

Definition at line 163 of file malloc.h.

Referenced by Memory< V, 0u, 0u, true >::~Memory().

◆ prefetchForOneRead()

void prefetchForOneRead ( const void * addr)
inline

Prefetch the cacheline containing addr for a single read access.

This prefetch completely bypasses the cache, not evicting any other data.

Parameters
addrThe cacheline containing addr will be prefetched.

Definition at line 510 of file memory.h.

◆ prefetchForModify()

void prefetchForModify ( const void * addr)
inline

Prefetch the cacheline containing addr for modification.

This prefetch evicts data from the cache. So use it only for data you really will use. When the target system supports it the cacheline will be marked as modified while prefetching, saving work later on.

Parameters
addrThe cacheline containing addr will be prefetched.

Definition at line 527 of file memory.h.

◆ prefetchClose()

void prefetchClose ( const void * addr)
inline

Prefetch the cacheline containing addr to L1 cache.

This prefetch evicts data from the cache. So use it only for data you really will use.

Parameters
addrThe cacheline containing addr will be prefetched.

Definition at line 542 of file memory.h.

◆ prefetchMid()

void prefetchMid ( const void * addr)
inline

Prefetch the cacheline containing addr to L2 cache.

This prefetch evicts data from the cache. So use it only for data you really will use.

Parameters
addrThe cacheline containing addr will be prefetched.

Definition at line 557 of file memory.h.

◆ prefetchFar()

void prefetchFar ( const void * addr)
inline

Prefetch the cacheline containing addr to L3 cache.

This prefetch evicts data from the cache. So use it only for data you really will use.

Parameters
addrThe cacheline containing addr will be prefetched.

Definition at line 572 of file memory.h.

◆ reinterpret_components_cast()

template<typename V , typename T , typename Abi >
enable_if<(V::size()==Vector< T, Abi >::size() && sizeof(typename V::VectorEntryType)== sizeof(typename Vector< T, Abi >::VectorEntryType) && sizeof(V)==sizeof(Vector< T, Abi >) &&alignof(V)<=alignof(Vector< T, Abi >)), V > reinterpret_components_cast ( const Vector< T, Abi > & x)
inline

Constructs a new Vector object of type V from the Vector x, reinterpreting the bits of x for the new type V.

This function is only applicable if:

  • the sizeof of the input and output types is equal
  • the Vector::size() of the input and output types is equal
  • the VectorEntryTypes of input and output have equal sizeof
Template Parameters
VThe requested type to change x into.
Parameters
xThe Vector to reinterpret as an object of type V.
Returns
A new object (rvalue) of type V.
Warning
This cast is non-portable since the applicability (see above) may change depending on the default vector types of the target platform. The function is perfectly safe to use with fully specified Abi, though.

Definition at line 839 of file vector.h.

◆ where()

template<typename M >
WhereImpl::WhereMask< M > where ( const M & mask)
constexpr

Conditional assignment.

Since compares between SIMD vectors do not return a single boolean, but rather a vector of booleans (mask), one often cannot use if / else statements. Instead, one needs to state that only a subset of entries of a given SIMD vector should be modified. The where function can be prepended to any assignment operation to execute a masked assignment.

Parameters
maskThe mask that selects the entries in the target vector that will be modified.
Returns
This function returns an opaque object that binds to the left operand of an assignment via the binary-or operator or the functor operator. (i.e. either where(mask) | x = y or where(mask)(x) = y)

Example:

template<typename T> void f1(T &x, T &y)
{
if (x < 2) {
x *= y;
y += 2;
}
}
template<typename T> void f2(T &x, T &y)
{
where(x < 2) | x *= y;
where(x < 2) | y += 2;
}
constexpr WhereImpl::WhereMask< M > where(const M &mask)
Conditional assignment.
Definition where.h:265

The block following the if statement in f1 will be executed if x < 2 evaluates to true. If T is a scalar type you normally get what you expect. But if T is a SIMD vector type, the comparison will use the implicit conversion from a mask to bool, meaning all_of(x < 2).

Most of the time the required operation is a masked assignment as stated in f2.

Definition at line 265 of file where.h.

Referenced by Vc::iif(), and Vc::where().

Variable Documentation

◆ Aligned

AlignedTag Aligned
constexpr

Use this object for a flags parameter to request aligned loads and stores.

It specifies that a load/store can expect a memory address that is aligned on the correct boundary. (i.e. MemoryAlignment)

Warning
If you specify Aligned, but the memory address is not aligned the program will most likely crash.

Definition at line 178 of file loadstoreflags.h.

Referenced by SimdArray< T, N, V, Wt >::reversed(), and SimdArray< T, N, V, Wt >::rotated().

◆ Unaligned

UnalignedTag Unaligned
constexpr

Use this object for a flags parameter to request unaligned loads and stores.

It specifies that a load/store can not expect a memory address that is aligned on the correct boundary. (i.e. alignment is less than MemoryAlignment)

Note
If you specify Unaligned, but the memory address is aligned the load/store will execute slightly slower than necessary.

Definition at line 191 of file loadstoreflags.h.

Referenced by SimdArray< T, N, V, Wt >::reversed(), SimdArray< T, N, V, Wt >::rotated(), MemoryBase< V, Parent, Dimension, RowMemory >::vector(), and MemoryBase< V, Parent, Dimension, RowMemory >::vector().

◆ Streaming

StreamingTag Streaming
constexpr

Use this object for a flags parameter to request streaming loads and stores.

It specifies that the cache should be bypassed for the given load/store. Whether this will actually be done depends on the target system's capabilities.

Streaming stores can be interesting when the code calculates values that, after being written to memory, will not be used for a long time or used by a different thread.

Note
Expect that most target systems do not support unaligned streaming loads or stores. Therefore, make sure that you also specify Aligned.

Definition at line 206 of file loadstoreflags.h.