What I really wanna do is parameters update for LSTM.
I've realized that my vocabulary is relatively large (about 500K) and having a vector of NArray is very inefficient (of which the reason I don't know). When trying to sync (by calling WaitForAll) for the following code:
int N = 600000;
int D = 128;
vector A = vector(N,NArray::Zeros({1,D}));
for(int i = 0 ; i < N ; i ++){
A[i] = NArray::Randn({1,D},0,0.05);
}
//calling ms.WaitForAll() here takes a long long time.....
it takes a long long time (maybe because pushing NArray into vector one at a time makes a lot of malloc calls on GPU, which is slow ?)
So instead I am thinking about having a giant 2D matrix and do the following thing.
NArray A = NArray::Randn({N,D},0,0.01);
NArray b = NArray::Randn({1,D},0,1);
A[10] = A[10] + b; // update a
It seems that this feature is not supported?
Any suggestion or comment is very appreciated...