Add SparseCsrCPU and SparseCsrCUDA dispatch to sum.dim_IntList #99292

yanbing-j · 2023-04-17T07:24:19Z

This PR is to add support of sum.dim_IntList for Sparse Tensor, which is exposed in #98796.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10

pytorch-bot · 2023-04-17T07:24:23Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/99292

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ 3 Unrelated Failures

As of commit e45db7a:

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pearu · 2023-04-19T07:06:18Z

aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp

@@ -1046,7 +1046,7 @@ Tensor reduce_sparse_csr_dim0_cpu_template(const Tensor& sparse, ReductionOp rop

  AT_DISPATCH_INDEX_TYPES(col_indices.scalar_type(), "reduce_sparse_csr_dim0_cpu_indices",
                          [&]() {
-                            index_t* columns_map_ptr = columns_map.data_ptr<index_t>();
+                            int64_t* columns_map_ptr = columns_map.data_ptr<int64_t>();


Good catch. Could you eliminate using AT_DISPATCH_INDEX_TYPES as index_t is not used in this block anymore?

Thanks for your comments. I find that simply changing index_t to int64_t here is not that appropriate. The root cause should be index_type should be align with columns_map's dtype, not col_indices's dtype. Have updated.

mingfeima

Test failures seem to be real, please have them fixed.

yanbing-j · 2023-05-09T09:27:49Z

Hi @pearu , I see your PR #100391 will raise error in 'aten::sum.IntList_out' with arguments from the 'SparseCsr(CPU|CUDA)', which has conflicts to current PR. Therefore, I remove the part of code, remains error of bsr and bsc.

But I'm confused to how to pass your UT. No matter I raise Error or add warning in bsr and bsc part, self.assertFalse(isinstance(out, type(NotImplemented))) will fail. Could you please give me some advices? And also have a look at this PR? Thank you!

pearu

Thanks, @yanbing-j, for this! I did my initial review of the PR and found that there exist more efficient paths to compute the sum of CSC tensors.

I'll address your testing questions a bit later.

pearu · 2023-05-09T13:49:35Z

aten/src/ATen/native/ReduceOps.cpp

@@ -2165,5 +2166,21 @@ Tensor sum_sparse_coo(const Tensor& self, at::OptionalIntArrayRef dim, bool keep
  return result;
 }

+Tensor sum_sparse_csr(const Tensor& self, at::OptionalIntArrayRef dim, bool keepdim, c10::optional<ScalarType> dtype) {


SparseCsrCPU/CUDA dispatch keys represent all sparse compressed layouts: CSR, BSR, CSC, BSC. The same holds for the sum_sparse_csr function so I suggest:

Suggested change

Tensor sum_sparse_csr(const Tensor& self, at::OptionalIntArrayRef dim, bool keepdim, c10::optional<ScalarType> dtype) {

Tensor sum_sparse_compressed(const Tensor& self, at::OptionalIntArrayRef dim, bool keepdim, c10::optional<ScalarType> dtype) {

pearu · 2023-05-09T13:51:25Z

aten/src/ATen/native/ReduceOps.cpp

+  // bit different in the second parameters `dim`, which causes the conversion of `dim`
+  // to call into `_sparse_csr_sum`. Align the signatures would be a better choice.
+  TORCH_CHECK(dim.has_value(),"dim has no value, cannot be used in sum.dim_IntList");
+  if (self.is_sparse_csr()) {


Use:

Suggested change

if (self.is_sparse_csr()) {

auto layout = self.layout();

if (layout == kSparseCsr) {

pearu · 2023-05-09T14:26:17Z

aten/src/ATen/native/ReduceOps.cpp

+    Tensor new_self = self.to_dense().to_sparse_csr();
+    return at::_sparse_csr_sum(new_self, *dim, true, dtype);


Conversion to a strided tensor is unnecessary, using self.to_sparse_csr() should be sufficient and is more efficient. That said, I think no conversions are necessary because we have the following invariants:

batch_dim = csc.dim() - csc.dense_dim() - csc.sparse_dim() csc.layout == torch.sparse_csc csc.transpose(batch_dim, batch_dim + 1).layout == torch.sparse_csr torch.sum(csc, dim=dim, keepdim=True) == torch.sum(csc.transpose(batch_dim, batch_dim+1), dim=(*dim[:batch_dim], dim[batch_dim+1], dim[batch_dim], *dim[batch_dim+2:]), keepdim=True).transpose(batch_dim, batch_dim+1)

So, I suggest using the following method for computing reductions on a CSC tensor:

Suggested change

Tensor new_self = self.to_dense().to_sparse_csr();

return at::_sparse_csr_sum(new_self, *dim, true, dtype);

auto batch_dim = csc.dim() - csc.dense_dim() - csc.sparse_dim();

auto swapped_dim = ...; // a copy of `*dim` where `batch_dim` and `batch_dim+1`- th elements are swapped

return at::_sparse_csr_sum(self.transpose(batch_dim, batch_dim+1), swapped_dim, true, dtype).transpose(batch_dim, batch_dim+1);

pearu · 2023-05-09T14:28:18Z

aten/src/ATen/native/ReduceOps.cpp

+    LOG(WARNING) << "Only SparseCsr and SparseCSC are supported for now";
+    return Tensor();


Suggested change

LOG(WARNING) << "Only SparseCsr and SparseCSC are supported for now";

return Tensor();

TORCH_CHECK(false, "sum expected input with strided, sparse_csr, or sparse_csc layouts, got layout ", layout);

return Tensor();

pearu · 2023-05-09T14:28:47Z

aten/src/ATen/native/native_functions.yaml

@@ -5389,6 +5389,7 @@
  dispatch:
    NestedTensorCPU: NestedTensor_sum_dim_CPU
    SparseCPU, SparseCUDA: sum_sparse_coo
+    SparseCsrCPU, SparseCsrCUDA: sum_sparse_csr # TODO: Align the signature of sum.dim_IntList and _sparse_csr_sum.dim_dtype


Nit:

Suggested change

SparseCsrCPU, SparseCsrCUDA: sum_sparse_csr # TODO: Align the signature of sum.dim_IntList and _sparse_csr_sum.dim_dtype

SparseCsrCPU, SparseCsrCUDA: sum_sparse_compressed # TODO: Align the signature of sum.dim_IntList and _sparse_csr_sum.dim_dtype

pearu · 2023-05-09T14:41:52Z

aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp

+                              new_values_acc_ptr[col] = rop(new_values_acc_ptr[col], static_cast<opmath_t>(val));
+                            }
+                            for (int64_t i = 0; i < nnz; i++) {
+                              if (need_acc) {
+                                new_values_ptr[i] = static_cast<scalar_t>(new_values_acc_ptr[i]);
+                              } else {
+                                new_values_ptr[i] = new_values_acc_ptr[i];
+                              }
                            }
                          });


Optimization nit continued:

Suggested change

new_values_acc_ptr[col] = rop(new_values_acc_ptr[col], static_cast<opmath_t>(val));

}

for (int64_t i = 0; i < nnz; i++) {

if (need_acc) {

new_values_ptr[i] = static_cast<scalar_t>(new_values_acc_ptr[i]);

} else {

new_values_ptr[i] = new_values_acc_ptr[i];

}

}

});

new_values_acc_ptr[col] = rop(new_values_acc_ptr[col], static_cast<opmath_t>(val));

}

});

if (need_acc) {

new_values.copy_(new_values_acc);

}

pearu · 2023-05-09T14:42:10Z

aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp

                          [&]() {
                            index_t* columns_map_ptr = columns_map.data_ptr<index_t>();
                            scalar_t* values_ptr = values.data_ptr<scalar_t>();
+                            opmath_t* new_values_acc_ptr = new_values_acc.data_ptr<opmath_t>();
                            scalar_t* new_values_ptr = new_values.data_ptr<scalar_t>();


Suggested change

scalar_t* new_values_ptr = new_values.data_ptr<scalar_t>();

pearu · 2023-05-09T14:44:25Z

test/test_ops.py

-    @parametrize("layout", (torch.sparse_csr, torch.sparse_csc, torch.sparse_bsr, torch.sparse_bsc, torch.sparse_coo))
+    @parametrize("layout", (torch.sparse_bsr, torch.sparse_bsc, torch.sparse_coo))


This change is not needed as it will disable existing tests for CSR and CSC samples.

pearu · 2023-05-09T14:46:53Z

test/test_sparse_csr.py

+            dense = sparse.to_dense()
+            for dim in (0, 1):
+                dense_sum = dense.sum(dim=dim)
+                sparse_sum = sparse.sum(dim=dim)


reductions over sparse dimensions of sparse compressed tensors require keepdim=True

pearu · 2023-05-09T14:48:31Z

test/test_sparse_csr.py

+        def run_test(shape, nnz, index_type):
+            sparse = self.genSparseCSRTensor(shape, nnz, dtype=dtype, device=device, index_dtype=index_dtype)
+            dense = sparse.to_dense()
+            for dim in (0, 1):


IIUC, dim is an integer that does not correspond to a new feature that this PR implements.

pearu

Thanks, @yanbing-j, for this! I did my initial review of the PR and found that there exist more efficient paths to compute the sum of CSC tensors.

I'll address your testing questions a bit later.

pearu · 2023-05-09T17:33:02Z

@yanbing-j:

Hi @pearu , I see your PR #100391 will raise error in 'aten::sum.IntList_out' with arguments from the 'SparseCsr(CPU|CUDA)', which has conflicts to current PR. Therefore, I remove the part of code, remains error of bsr and bsc.

Note that in the end, you should restore test_ops.py as it was.

But I'm confused to how to pass your UT. No matter I raise Error or add warning in bsr and bsc part, self.assertFalse(isinstance(out, type(NotImplemented))) will fail. Could you please give me some advices? And also have a look at this PR? Thank you!

The assertFalse fails because out is None due to
https://github.com/yanbing-j/pytorch/blob/5dd3bb8fa2500b030e0d3b9f58047614a1555d06/aten/src/ATen/native/ReduceOps.cpp#L2180-L2181
You should raise exception as discussed in #99292 (comment) and then update _validate_sample_input_sparse_reduction_sum in torch/testing/_internal/opinfo/definitions/sparse.py accordingly.

pearu · 2023-05-10T10:08:06Z

aten/src/ATen/native/ReduceOps.cpp

+    if (self.dim() != 2 || keepdim) {
+      TORCH_CHECK(
+          false,
+          "sum expected input with strided, sparse_csr layouts, got layout ",
+          layout);
+    } else if (!keepdim) {
+      TORCH_CHECK(
+          false,
+          "torch.empty: Only batched sparse compressed (non-block) tensors are supported");
+    }


If we don't implement support for non-CSR layouts, there is no need to check other parameters. So:

Suggested change

if (self.dim() != 2 || keepdim) {

TORCH_CHECK(

false,

"sum expected input with strided, sparse_csr layouts, got layout ",

layout);

} else if (!keepdim) {

TORCH_CHECK(

false,

"torch.empty: Only batched sparse compressed (non-block) tensors are supported");

}

TORCH_CHECK(

false,

"sum expected input with strided or sparse_csr layout, got layout ",

layout);

}

If not check the parameters, all the RuntimeError would be sum expected input with strided or sparse_csr layout, got layout, which could not pass the cases which expected error torch.empty: Only batched sparse compressed. WDYT?

The expected error torch.empty: Only batched sparse compressed.. is just wrong and it should be updated in the sparse.py after the error message is updated.

pearu · 2023-05-10T10:08:22Z

aten/src/ATen/native/ReduceOps.cpp

+    if (self.dim() != 2 || keepdim) {
+      TORCH_CHECK(
+          false,
+          "sum expected input with strided, sparse_csr layouts, got layout ",
+          layout);
+    } else if (!keepdim) {
+      TORCH_CHECK(
+          false,
+          "torch.empty: Only batched sparse compressed (non-block) tensors are supported");
+    }


If we don't implement support for non-CSR layouts, there is no need to check other parameters. So:

Suggested change

if (self.dim() != 2 || keepdim) {

TORCH_CHECK(

false,

"sum expected input with strided, sparse_csr layouts, got layout ",

layout);

} else if (!keepdim) {

TORCH_CHECK(

false,

"torch.empty: Only batched sparse compressed (non-block) tensors are supported");

}

TORCH_CHECK(

false,

"sum expected input with strided or sparse_csr layout, got layout ",

layout);

}

pearu · 2023-05-10T10:13:50Z

aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp

+                if (need_acc) {
+                  new_values_ptr[row_map_ptr[h]] = static_cast<scalar_t>(res);
+                } else {
+                  new_values_ptr[row_map_ptr[h]] = res;
                }


Nit:

Suggested change

if (need_acc) {

new_values_ptr[row_map_ptr[h]] = static_cast<scalar_t>(res);

} else {

new_values_ptr[row_map_ptr[h]] = res;

}

new_values_ptr[row_map_ptr[h]] = static_cast<scalar_t>(res);

pearu · 2023-05-10T10:14:18Z

aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp

+  bool need_acc = (values.scalar_type() == kHalf || values.scalar_type() == kBFloat16 || values.scalar_type() == kComplexHalf);
+


Nit:

Suggested change

bool need_acc = (values.scalar_type() == kHalf || values.scalar_type() == kBFloat16 || values.scalar_type() == kComplexHalf);

pearu · 2023-05-10T10:33:32Z

aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp

@@ -1042,23 +1042,37 @@ Tensor reduce_sparse_csr_dim0_cpu_template(const Tensor& sparse, ReductionOp rop
  new_crow_indices[1] = nnz;

  Tensor new_values = at::empty({nnz}, values.options());
-  new_values.fill_(rop.identity());
+  bool need_acc = (values.scalar_type() == kHalf || values.scalar_type() == kBFloat16 || values.scalar_type() == kComplexHalf);


Just a note: in principle, the need_acc information is defined by scalar_t and ReductionOp. This would allow compile-time optimizations because the if-blocks on need_acc can use constexpr. For instance, if ReductionOp would store its template typename as a type member, we could have:

constexpr bool need_acc = !std::is_same<scalar_t, rop::type>::value;

pearu · 2023-05-10T10:42:29Z

test/test_sparse_csr.py

+                dense_sum = dense.sum(dim=dim)
+                sparse_sum = sparse.sum(dim=dim, keepdim=True)
+                is_integral = dtype in integral_types()
+                self.assertEqual(sparse_sum.to_dense().view(dense_sum.shape)
+                                 if not is_integral else sparse_sum.to_dense().to(torch.int64).view(dense_sum.shape), dense_sum)


We should have the following invariant holding:

sparse.sum(dim=dim, keepdim=True).to_dense() == sparse.to_dense().sum(dim=dim, keepdim=True)

So:

Suggested change

dense_sum = dense.sum(dim=dim)

sparse_sum = sparse.sum(dim=dim, keepdim=True)

is_integral = dtype in integral_types()

self.assertEqual(sparse_sum.to_dense().view(dense_sum.shape)

if not is_integral else sparse_sum.to_dense().to(torch.int64).view(dense_sum.shape), dense_sum)

dense_sum = dense.sum(dim=dim, keepdim=True)

sparse_sum = sparse.sum(dim=dim, keepdim=True)

self.assertEqual(sparse_sum, dense_sum)

(assertEqual with handle the conversion of sparse_sum to a strided tensor).

pearu · 2023-05-10T10:46:52Z

test/test_sparse_csr.py

+                if dtype in floating_types():
+                    sparse_sum.requires_grad_(True)
+                    sparse_sum.sum().backward()
+                    dense_sum.requires_grad_(True)
+                    dense_sum.sum().backward()
+                    self.assertEqual(sparse_sum.grad.view(dense_sum.shape), torch.ones(dense_sum.shape, dtype=dtype, device=device))
+                    self.assertEqual(sparse_sum.grad.view(dense_sum.shape), dense_sum.grad)


These tests are already covered by test_sum. So:

Suggested change

if dtype in floating_types():

sparse_sum.requires_grad_(True)

sparse_sum.sum().backward()

dense_sum.requires_grad_(True)

dense_sum.sum().backward()

self.assertEqual(sparse_sum.grad.view(dense_sum.shape), torch.ones(dense_sum.shape, dtype=dtype, device=device))

self.assertEqual(sparse_sum.grad.view(dense_sum.shape), dense_sum.grad)

In fact, considering the tests test_reductions and test_reductions_backward in test_sparse.py, there is no need for test_sum_dim_reduce.

Just verify that sample_inputs_sparse_reduction_sum in torch/testing/_internal/opinfo/definitions/sparse.py produce the equivalent samples (I believe it does) and remove test_sum_dim_reduce.

pearu

I have a number of suggestions to simplify the code. Also, test_sum_dim_reduce appears to be unnecessary as the opinfo based test functions cover the samples corresponding to the feature added in this PR.

pearu

I have a number of suggestions to simplify the code. Also, test_sum_dim_reduce appears to be unnecessary as the opinfo based test functions cover the samples corresponding to the feature added in this PR.

yanbing-j · 2023-05-10T11:14:16Z

I have one more question, for the test like test_consistency_SparseCSR_sum_cpu_bfloat16, its input dimension would be larger than 2, is this expected? However, _sparse_csr_sum only supports dim == 2. What can we do to pass the UTs?

Add the corresponding if-block to _validate_sample_input_sparse_reduction_sum in sparse.py.

yanbing-j · 2023-05-14T11:08:59Z

@pearu , Thank you so much for the comments. Could you please take another look of this PR?

pearu

This looks good, thanks @yanbing-j! But there are also changes that need to be revised/fixed, in particular

sum on CSR tensor appears to ignore user-specified dtype argument. In general, the sum on a tensor with any sparse layout must have exactly the same behavior as the sum on a strided tensor.
consistency checks are disabled for CSR and CSC samples. If the consistency checks fail on such samples, we should fix the issue rather than ignore it, or worse of all, disable the consistency checks for all other operations as well as this PR does.
The suggestion to rename sum_sparse_csr to sum_sparse_compressed is not applied. Although this PR addresses only CSR tensors, we will use the same function for other sparse compressed layouts in follow-ups. There are no intentions to introduce sum_sparse_csc etc functions, so let's use the correct naming of the function immediately.

pearu · 2023-05-15T07:23:02Z

aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp

+  auto values_acc_option = values.options();
+  if (need_acc) {
+    values_acc_option = values.scalar_type() == kComplexHalf
+        ? values.options().dtype(ScalarType::ComplexFloat)
+        : values.options().dtype(ScalarType::Float);
+  }
+  Tensor new_values_acc =
+      (need_acc ? at::empty({nnz}, values_acc_option) : new_values);


Now that need_acc is constexpr, I suggest using:

Suggested change

auto values_acc_option = values.options();

if (need_acc) {

values_acc_option = values.scalar_type() == kComplexHalf

? values.options().dtype(ScalarType::ComplexFloat)

: values.options().dtype(ScalarType::Float);

}

Tensor new_values_acc =

(need_acc ? at::empty({nnz}, values_acc_option) : new_values);

Tensor new_values_acc;

if constexpr (need_acc) {

auto acc_dtype = values.scalar_type() == kComplexHalf ? ScalarType::ComplexFloat : ScalarType::Float;

new_values_acc = at::empty({nnz}, values.options().dtype(acc_dtype));

} else {

new_values_acc = new_values;

}

pearu · 2023-05-15T07:59:24Z

aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp

+  auto is_integral = at::isIntegralType(dtype_, /*includeBool=*/false);
+  if (is_integral) {
+    result = result.to(ScalarType::Long);
+  }


Hmm, this is wrong as it overrides the user-specified dtype value. For strided tensors, we have sum result dtype tests in test/test_reductions.py but it looks like we don't generate sparse samples for sparse reduction tests (test_reductions) in test/test_sparse.py that would have caught this issue.
I think we should have:

Suggested change

auto is_integral = at::isIntegralType(dtype_, /*includeBool=*/false);

if (is_integral) {

result = result.to(ScalarType::Long);

}

auto is_integral = !dtype.has_value() && at::isIntegralType(dtype_, /*includeBool=*/false);

if (is_integral) {

result = result.to(ScalarType::Long);

}

or similar.

pearu · 2023-05-15T08:00:52Z

aten/src/ATen/native/sparse/cuda/SparseCsrTensorMath.cu

+  auto is_integral = at::isIntegralType(dtype_, /*includeBool=*/false);
+  if (is_integral) {
+    result = result.to(ScalarType::Long);
+  }


Same note here as above: when a user specifies dtype then the result must have the specified dtype.

pearu · 2023-05-15T08:06:35Z

test/test_sparse_csr.py

+                    if layout == torch.sparse_csc:
+                        return x
+                    if layout == torch.sparse_csr and (x.dtype == torch.bool or x.dtype == torch.complex32) and op.name == "sum":
+                        return x


Why this change? This disables the consistency checks for CSC and CSR tensors and we should not do that.

These changes are due to _validate_sample_input_sparse_reduction_sum make the limit to CSC tensor and CSR tensor with bool or complex32.
Therefore the sample will not be added into samples in test_sparse_csr.py, and raise Error of Expected at least one 2 or higher D tensor in samples.

WDYT? @pearu

samples that don't pass _validate_sample_input_sparse_reduction_sum (read: such samples are mapped to ErrorInput instances) are not included in the set of samples that op.sample_inputs(device, dtype) generates. So, this change is unnecessary (it ought to be).

In addition, notice that test_consistency is used for many operations, not just sum. So, any changes to this test must not affect testing other operations. This PR just does this: it disables consistency tests for all non-sum operations with respect to CSC inputs.

So, please undo this change here and report what problems exist for the sum operation. If there are any, these problems ought to be tackled in sparse.py, not here.

Thanks for the clarification, @pearu . I have removed the unnecessary change in test_sparse_csr.py.

Now the problem is, in sparse.py, CSC inputs and CSR inputs with bool and complex32 data type will generate ErrorInput, which is expected in this PR. And the samples will become empty, and raise Error of Expected at least one 2 or higher D tensor in samples. How can we do in sparse.py to fix the error? Thank you!

if validate_sample_input_sparse(op, sparse_sample, check_validate=False) is not sparse_sample: # that is, the validation returns the sparse sample # wrapped within ErrorInput instance continue samples.append((sample, sparse_sample)) # Fail early to prevent silent success with this test if len(samples) == 0: raise ValueError("Expected at least one 2 or higher D tensor in samples.")

cpuhrsch · 2023-07-18T16:53:57Z

aten/src/ATen/native/ReduceOps.cpp

+      dim.has_value(), "dim has no value, cannot be used in sum.dim_IntList");
+  auto layout = self.layout();
+  TORCH_CHECK(layout == kSparseCsr,
+              "sum expected input with strided, sparse_csr layouts, got layout ", layout)


Can you update this to "Currently the only compressed sparse format supported for sum.dim_IntList is CSR, but got layout ", layout. It's also not true that only strided and CSR are supported. We also have a kernel registered under the Sparse dispatch key which means support for COO.

cpuhrsch

There's a lot of duplicated code around

  if constexpr (need_acc) {
    auto acc_dtype = CppTypeToScalarType<acc_t>::value;
    new_values_acc = at::empty({}, values.options().dtype(acc_dtype));
    new_values = is_integral ? new_values_acc : at::empty({}, values.options());
  } else {
    new_values_acc = new_values = at::empty({}, values.options());;
  }

Can we abstract that into a helper within a header or such and unify that logic into a single place? We don't want to have this diverge between devices or formats or input settings.

cpuhrsch · 2023-07-19T19:34:33Z

aten/src/ATen/SparseCsrTensorUtils.h

@@ -366,5 +367,36 @@ inline bool only_sparse_compressed_add_trivial_cases(
      });
 }

+inline Tensor to_type(Tensor input, ScalarType dtype) {


Did you check whether we already have support for this conversion? We have the torch.Tensor.to operator.

Please refer to the comment #99292 (comment).

cpuhrsch · 2023-07-19T19:35:13Z

aten/src/ATen/native/sparse/cuda/SparseCsrTensorMath.cu

+  // of float, while in CPU, double is the accumulate type of float.
+  using acc_t = at::acc_type<scalar_t, true>;
+  constexpr bool need_acc = !std::is_same<scalar_t, acc_t>::value;
+  bool is_integral = at::isIntegralType(values.scalar_type(), /*includeBool=*/true);


This line is repeated 4 times. The intent of my higher level comment was not around a single block of particular code. It was to abstract away repeated code.

cpuhrsch

Thank you for addressing my comments. I think we can simplify this a bit more by abstracting away more code and checking for existing functionality.

cpuhrsch · 2023-07-20T16:34:33Z

aten/src/ATen/SparseCsrTensorUtils.h

+}
+
+template <typename acc_t, typename scalar_t>
+inline void create_acc_buffer(


I think we can also include the construction and resize of Tensor new_values, new_values_acc; into this helper function.

cpuhrsch · 2023-07-21T19:44:02Z

aten/src/ATen/native/sparse/SparseCsrTensorMath.cpp

@@ -1124,9 +1142,12 @@ Tensor reduce_sparse_csr_dim1_cpu_template(const Tensor& sparse, ReductionOp rop
    new_col_indices.resize_(nnz);
    new_col_indices.fill_(index_t(0));
    new_values.resize_(nnz);
+    if (!new_values_acc.is_same(new_values)) {


Thank you for addressing the comments. Is there something you can do to abstract away this line as well? It's repeat 5 times.

Simplify the dispatch Add UT Fix the bug of index_type in reduce_sparse_csr_dim0_cpu_template Use opmath_t as reduction type Fix CI failures The failures are from: 1. SparseCsrCPU includes csc, bsr, bsc, except for csr. csr is only supported for now, csc can be easily converted to csr, while bsr and bsc are not. 2. This PR has conflict with pytorch#100391. Remove the change in test_sparse_csr.py Remove AT_DISPATCH_INDEX_TYPES based on comments Remove to(torch.int64) explicitly and change input data type to torch.int64 Support integral return value in reduce_sparse_csr_cpu_template Refactor according to comments Abstract to_type in sparse_csr Update based on comments

cpuhrsch · 2023-07-24T17:27:03Z

@pytorchbot merge

pytorchmergebot · 2023-07-24T17:30:52Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This PR addresses the sparse tensors part of #99655 The PR introduces the following utility functions: - `at::sparse_csr::alias_with_values(a_sparse_compressed_tensor, new_values)` - `at::sparse::alias_with_values(a_sparse_tensor, new_values)` These functions return a wrapper of a sparse tensor with new specified values that allow introducing alias support for sparse tensors and more (e.g. the most efficient way to resolve #99292 (comment) is to use `at::sparse_csr::alias_with_values(self, self.values().to(dtype))` as a replacement of `self.to(dtype)` to avoid the unnecessary copy of indices). [ghstack-poisoned]

pytorch-bot bot added the release notes: sparse release notes category label Apr 17, 2023

yanbing-j requested a review from mingfeima April 17, 2023 07:27

pytorchbot added the open source label Apr 17, 2023

yanbing-j force-pushed the yanbing/sparse_sum_csr branch from 5cc3191 to e64dd94 Compare April 18, 2023 02:36

mingfeima requested a review from pearu April 18, 2023 05:24

pearu requested changes Apr 19, 2023

View reviewed changes

mingfeima requested changes Apr 21, 2023

View reviewed changes

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 25, 2023

yanbing-j force-pushed the yanbing/sparse_sum_csr branch 2 times, most recently from 277d51b to 9995d22 Compare May 9, 2023 05:23

yanbing-j requested review from mruberry and ngimel as code owners May 9, 2023 09:20

pearu requested changes May 9, 2023

View reviewed changes

pearu requested changes May 10, 2023

View reviewed changes

yanbing-j force-pushed the yanbing/sparse_sum_csr branch 5 times, most recently from f4250fe to 49badf8 Compare May 14, 2023 08:12

yanbing-j requested review from pearu and mingfeima May 14, 2023 11:09

pearu requested changes May 15, 2023

View reviewed changes

cpuhrsch reviewed Jul 18, 2023

View reviewed changes

cpuhrsch requested changes Jul 18, 2023

View reviewed changes

yanbing-j force-pushed the yanbing/sparse_sum_csr branch 2 times, most recently from b3524bf to 06f1756 Compare July 19, 2023 09:26

yanbing-j requested a review from cpuhrsch July 19, 2023 09:27

cpuhrsch reviewed Jul 19, 2023

View reviewed changes

cpuhrsch requested changes Jul 19, 2023

View reviewed changes

yanbing-j force-pushed the yanbing/sparse_sum_csr branch from 06f1756 to 879bd2c Compare July 20, 2023 13:00

yanbing-j requested a review from cpuhrsch July 20, 2023 13:53

cpuhrsch reviewed Jul 20, 2023

View reviewed changes

yanbing-j force-pushed the yanbing/sparse_sum_csr branch 2 times, most recently from 7eaae75 to 479b4ec Compare July 21, 2023 08:48

cpuhrsch reviewed Jul 21, 2023

View reviewed changes

yanbing-j added 2 commits July 23, 2023 12:13

Update based on comments

e45db7a

yanbing-j force-pushed the yanbing/sparse_sum_csr branch from 479b4ec to e45db7a Compare July 23, 2023 04:14

yanbing-j requested a review from cpuhrsch July 23, 2023 05:16

cpuhrsch approved these changes Jul 24, 2023

View reviewed changes

pytorchmergebot added the merging label Jul 24, 2023

pytorchmergebot added Merged and removed merging labels Jul 24, 2023

pytorchmergebot closed this in a540435 Jul 24, 2023

	Tensor sum_sparse_csr(const Tensor& self, at::OptionalIntArrayRef dim, bool keepdim, c10::optional<ScalarType> dtype) {
	Tensor sum_sparse_compressed(const Tensor& self, at::OptionalIntArrayRef dim, bool keepdim, c10::optional<ScalarType> dtype) {

	if (self.is_sparse_csr()) {
	auto layout = self.layout();
	if (layout == kSparseCsr) {

		Tensor new_self = self.to_dense().to_sparse_csr();
		return at::_sparse_csr_sum(new_self, *dim, true, dtype);

-    Tensor new_self = self.to_dense().to_sparse_csr();
-    return at::_sparse_csr_sum(new_self, *dim, true, dtype);
+    auto batch_dim = csc.dim() - csc.dense_dim() - csc.sparse_dim();
+    auto swapped_dim = ...;  // a copy of `*dim` where `batch_dim` and `batch_dim+1`- th elements are swapped
+    return at::_sparse_csr_sum(self.transpose(batch_dim, batch_dim+1), swapped_dim, true, dtype).transpose(batch_dim, batch_dim+1);

		LOG(WARNING) << "Only SparseCsr and SparseCSC are supported for now";
		return Tensor();

	SparseCsrCPU, SparseCsrCUDA: sum_sparse_csr # TODO: Align the signature of sum.dim_IntList and _sparse_csr_sum.dim_dtype
	SparseCsrCPU, SparseCsrCUDA: sum_sparse_compressed # TODO: Align the signature of sum.dim_IntList and _sparse_csr_sum.dim_dtype

		@parametrize("layout", (torch.sparse_csr, torch.sparse_csc, torch.sparse_bsr, torch.sparse_bsc, torch.sparse_coo))
		@parametrize("layout", (torch.sparse_bsr, torch.sparse_bsc, torch.sparse_coo))

		bool need_acc = (values.scalar_type() == kHalf \|\| values.scalar_type() == kBFloat16 \|\| values.scalar_type() == kComplexHalf);

Add SparseCsrCPU and SparseCsrCUDA dispatch to sum.dim_IntList #99292

Add SparseCsrCPU and SparseCsrCUDA dispatch to sum.dim_IntList #99292

Uh oh!

Conversation

yanbing-j commented Apr 17, 2023 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Apr 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/99292

✅ 3 Unrelated Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mingfeima left a comment

Choose a reason for hiding this comment

Uh oh!

yanbing-j commented May 9, 2023

Uh oh!

pearu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pearu left a comment

Choose a reason for hiding this comment

Uh oh!

pearu commented May 9, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yanbing-j May 10, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pearu left a comment

Choose a reason for hiding this comment

Uh oh!

pearu left a comment

Choose a reason for hiding this comment

Uh oh!

yanbing-j commented May 10, 2023 • edited by pearu Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

yanbing-j commented Apr 17, 2023 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Apr 17, 2023 •

edited

Loading

yanbing-j May 10, 2023 •

edited

Loading

yanbing-j commented May 10, 2023 •

edited by pearu

Loading

pearu May 15, 2023 •

edited

Loading

yanbing-j May 17, 2023 •

edited

Loading