1 SparseTensor
2 ============
3
4 Sparse Tensors are stored as two dense tensors and a shape:
5
6 * `indices`: a `brain::Tensor` storing a matrix, typically `int64`
7 * `values`: a `brain::Tensor` storing a vector with values of type T.
8 * `shape`: a `TensorShape` storing the bounds of the underlying tensor
9 * `order`: (optional) a `gtl::InlinedVector<int64,8>` with the dimensions
10 along which the indices are ordered.
11
12 Let
13
14 ix = indices.matrix<int64>()
15 vals = values.vec<T>()
16
17 The shape of `ix` is `N x NDIMS`, and each row corresponds to the
18 index of a single element of the sparse tensor.
19
20 The length of `vals` must be `N`, and `vals(i)` corresponds to the
21 value with index `ix(i,:)`.
22
23 Shape must be a `TensorShape` with `dims() == NDIMS`.
24 The shape is the full shape of the dense tensor these indices
25 represent.
26
27 To be specific, the representation (pseudocode) is:
28
29 tensor[ix[i,:]] == vals[i] for i = 0, ..., N-1
30
31 Ordering
32 --------
33
34 Indices need not be provided in order. For example, the following
35 index matrix is ordered according to dimension order `{0, 1, 2}`.
36
37 [0 0 1]
38 [0 1 1]
39 [2 0 2]
40
41 However, you can provide an unordered version:
42
43 [2 0 2]
44 [0 0 1]
45 [0 1 1]
46
47 If the SparseTensor is constructed without a provided order, then a
48 the default order is `{-1, ..., -1}`. Certain operations will fail or crash
49 when the order is not provided.
50
51 Resorting the SparseTensor in-place (which resorts the underlying index and
52 values tensors in-place) will update the order. The cost of reordering the
53 matrix is `O(N*log(N))`, and requires `O(N)` additional temporary space to store
54 a reordering index. If the default order is not specified and reordering is not
55 performed, the following will happen:
56
57 * `group()` will **raise an assertion failure**
58 * `IndicesValid()` will **raise an assertion failure**
59
60 To update the internal index ordering after construction, call
61 `Reorder<T>()` via, e.g., `Reorder<T>({0,1,2})`.
62 After this step, all the above methods should work correctly.
63
64 The method `IndicesValid()` checks to make sure:
65
66 * `0 <= ix(i, d) < shape.dim_size(d)`
67 * indices do not repeat
68 * indices are in order
69
70 Iterating
71 ---------
72
73 ### group({grouping dims})
74
75 * provides an iterator that groups entries according to
76 dimensions you care about
77 * may require a sort if your data isn't presorted in a way that's
78 compatible with grouping_dims
79 * for each group, returns the group index (values of the group
80 dims for this iteration), the subset of indices in this group,
81 and the subset of values in this group. these are lazy outputs
82 so to read them individually, copy them as per the example
83 below.
84
85 #### **NOTE**
86 `group({dim0, ..., dimk})` will **raise an assertion failure** if the
87 order of the SparseTensor does not match the dimensions you wish to group by.
88 You must either have your indices in the correct order and construct the
89 SparseTensor with
90
91 order = {dim0, ..., dimk, ...}
92
93 or call
94
95 Reorder<T>({dim0, .., dimk, ...})
96
97 to sort the SparseTensor before grouping.
98
99 Example of grouping:
100
101 Tensor indices(DT_INT64, TensorShape({N, NDIMS});
102 Tensor values(DT_STRING, TensorShape({N});
103 TensorShape shape({dim0,...});
104 SparseTensor sp(indices, vals, shape);
105 sp.Reorder<string>({1, 2, 0, 3, ...}); // Must provide NDIMS dims.
106 // group according to dims 1 and 2
107 for (const auto& g : sp.group({1, 2})) {
108 cout << "vals of ix[:, 1,2] for this group: "
109 << g.group()[0] << ", " << g.group()[1];
110 cout << "full indices of group:\n" << g.indices();
111 cout << "values of group:\n" << g.values();
112
113 TTypes<int64>::UnalignedMatrix g_ix = g.indices();
114 TTypes<string>::UnalignedVec g_v = g.values();
115 ASSERT(g_ix.dimension(0) == g_v.size()); // number of elements match.
116 }
117
118
119 ToDense
120 --------
121
122 Converts sparse tensor to dense. You must provide a pointer to the
123 dense tensor (preallocated). `ToDense()` will optionally
124 preinitialize the tensor with zeros.
125
126 Shape checking is performed, as is boundary checking.
127
128 Tensor indices(DT_INT64, TensorShape({N, NDIMS});
129 Tensor values(DT_STRING, TensorShape({N});
130 TensorShape shape({dim0,...});
131 SparseTensor sp(indices, vals, shape);
132 ASSERT(sp.IndicesValid()); // checks ordering & index bounds.
133
134 Tensor dense(DT_STRING, shape);
135 // initialize other indices to zero. copy.
136 ASSERT(sp.ToDense<string>(&dense, true));
137
138
139 Concat
140 --------
141
142 Concatenates multiple SparseTensors and returns a new SparseTensor.
143 This concatenation is with respect to the "dense" versions of these
144 SparseTensors. Concatenation is performed along dimension order[0]
145 of all tensors. As a result, shape[order[0]] may differ across
146 the inputs, but shape[d] for d != order[0] must match across all inputs.
147
148 We call order[0] the **primary dimension**.
149
150 **Prerequisites**
151
152 * The inputs' ranks must all match.
153 * The inputs' order[0] must all match.
154 * The inputs' shapes must all match except for dimension order[0].
155 * The inputs' values must all be of the same type.
156
157 If any of these are false, concat will die with an assertion failure.
158
159 Example:
160 Concatenate two sparse matrices along columns.
161
162 Matrix 1:
163
164 [0 0 1]
165 [2 0 0]
166 [3 0 4]
167
168 Matrix 2:
169
170 [0 0 0 0 0]
171 [0 1 0 0 0]
172 [2 0 0 1 0]
173
174 Concatenated Matrix:
175
176 [0 0 1 0 0 0 0 0]
177 [2 0 0 0 1 0 0 0]
178 [3 0 4 2 0 0 1 0]
179
180 Expected input shapes, orders, and `nnz()`:
181
182 shape_1 = TensorShape({3, 3})
183 shape_2 = TensorShape({3, 8})
184 order_1 = {1, 0} // primary order is 1, columns
185 order_2 = {1, 0} // primary order is 1, must match
186 nnz_1 = 4
187 nnz_2 = 3
188
189 Output shapes and orders:
190
191 conc_shape = TensorShape({3, 11}) // primary dim increased, others same
192 conc_order = {1, 0} // Orders match along all inputs
193 conc_nnz = 7 // Sum of nonzeros of inputs
194
195 Coding Example:
196
197 Tensor ix1(DT_INT64, TensorShape({N1, 3});
198 Tensor vals1(DT_STRING, TensorShape({N1, 3});
199 Tensor ix2(DT_INT64, TensorShape({N2, 3});
200 Tensor vals2(DT_STRING, TensorShape({N2, 3});
201 Tensor ix3(DT_INT64, TensorShape({N3, 3});
202 Tensor vals3(DT_STRING, TensorShape({N3, 3});
203
204 SparseTensor st1(ix1, vals1, TensorShape({10, 20, 5}), {1, 0, 2});
205 SparseTensor st2(ix2, vals2, TensorShape({10, 10, 5}), {1, 0, 2});
206 // For kicks, st3 indices are out of order, but order[0] matches so we
207 // can still concatenate along this dimension.
208 SparseTensor st3(ix3, vals3, TensorShape({10, 30, 5}), {1, 2, 0});
209
210 SparseTensor conc = SparseTensor::Concat<string>({st1, st2, st3});
211 Tensor ix_conc = conc.indices();
212 Tensor vals_conc = conc.values();
213 EXPECT_EQ(conc.nnz(), st1.nnz() + st2.nnz() + st3.nnz());
214 EXPECT_EQ(conc.Shape(), TensorShape({10, 60, 5}));
215 EXPECT_EQ(conc.Order(), {-1, -1, -1});
216
217 // Reorder st3 so all input tensors have the exact same orders.
218 st3.Reorder<string>({1, 0, 2});
219 SparseTensor conc2 = SparseTensor::Concat<string>({st1, st2, st3});
220 EXPECT_EQ(conc2.Order(), {1, 0, 2});
221 // All indices' orders matched, so output is in order.
222 EXPECT_TRUE(conc2.IndicesValid());
223