Skip to main content

On Sale: GamesAssetsToolsTabletopComics
Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
TagsGame Engines

Cuda out of Memory

A topic by cgmodeler created Jan 17, 2020 Views: 14,401 Replies: 20
Viewing posts 1 to 7

Sory for double posting but i think this topic is required here so other users can solve it too.

Just tried it but keep getting the CUDA out of memory error. Tried reducing the video size from 1100 width to 550px but still the same error. I have a Gtx1070. Any hints on what I can test or a log that i can check to see where the error is located?

Developer

Hey there, can you post a print of the console or copy the text to pastebin? But i'm almost sure that the resolution still to big, try a real small one, like 150X150 and see if it work.

Tried again with an even lower res, i went little by little 140px, 120, and it worked on 115px

What could be the issue I tried on two pc
one is 48gb ram xeon 2.4ghz nvidia gtx 1070 tons of hard drive space too

the other 12gb ram xeon 3.6ghz nvidia gtx1070 ssd 

both only worked on 115px

(1 edit)

GTX1070

Drivers 436.30

CUDA 10.1.0

Got the same error with the 150px width, 550px, 1100px.

BTW the window closed before i could capture the text so i did a gif.

Developer

Gtx 1070 should have CUDA 6.1, no?

Strange, my friend did tests on GTX 1060 and worked just fine, i think this is the first that i seen this error. It even look like your CUDA is already busy with something else. This one will be hard to debug.

What version of CUDA are you working with? my version it's 10.1, perhaps if I dowgrade, it will work?

It worked with 115px but nothing bigger so it's strange.

Developer

If it worked with 115px, then it's working fine, right now the app really eat up a lot of memory.

I managed to make it work with 500px, it seems that graphics memory is the issue, so i had to restart the machine, kill all the processes and just leave Dainapp open then process the 500px file, I couldnt go any higher than 500px.


Is it possible to process image sequences with alpha (like png)? maybe in the future?

Developer

Yes, alpha is planned for the future. Image sequence already is possible in 0.2 that will go public in one week.

Amazing!!! THanks for the great work!

I get the same error 90% of the time

cudnn is used
OK
D:/Hentai/Pics/1566840918519.gif
Input FPS: 12.5
QWindowsNativeFileDialogBase::shellItem : Unhandled scheme:  "data"
C:/Users/Deej/Desktop
D:/Hentai/Pics/1566840918519.gif
C:/Users/Deej/Desktop/1566840918519//1566840918519.mp4
12.5
0
0
1
Interpolate 1 frames
The testing model weight is: ./model_weights/best.pth
Framerate Index: 0
D:\DAIN_APP Alpha\torch\nn\functional.py:2494: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
..\torch\csrc\autograd\python_function.cpp:622: UserWarning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)
D:\DAIN_APP Alpha\torch\nn\functional.py:2693: UserWarning: Default grid_sample and affine_grid behavior will be changed to align_corners=False from 1.4.0. See the documentation of grid_sample for details.
  warnings.warn("Default grid_sample and affine_grid behavior will be changed "
..\torch\csrc\autograd\python_function.cpp:622: UserWarning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)
..\torch\csrc\autograd\python_function.cpp:622: UserWarning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)
..\torch\csrc\autograd\python_function.cpp:622: UserWarning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)
..\torch\csrc\autograd\python_function.cpp:622: UserWarning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)
..\torch\csrc\autograd\python_function.cpp:622: UserWarning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)
..\torch\csrc\autograd\python_function.cpp:622: UserWarning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)
..\torch\csrc\autograd\python_function.cpp:622: UserWarning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)
..\torch\csrc\autograd\python_function.cpp:622: UserWarning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)
..\torch\csrc\autograd\python_function.cpp:622: UserWarning: Legacy autograd function with non-static forward method is deprecated and will be removed in 1.3. Please use new-style autograd function with static forward method. (Example: https://pytorch.org/docs/stable/autograd.html#torch.autograd.Function)
Traceback (most recent call last):
  File "my_design.py", line 79, in render
  File "my_DAIN_class.py", line 469, in RenderVideo
  File "my_DAIN_class.py", line 202, in interpolate
  File "site-packages\torch\nn\modules\module.py", line 541, in __call__
  File "networks\DAIN_slowmotion.py", line 182, in forward
  File "site-packages\torch\nn\modules\module.py", line 541, in __call__
  File "Resblock\BasicBlock.py", line 81, in forward
  File "site-packages\torch\nn\modules\module.py", line 541, in __call__
  File "site-packages\torch\nn\modules\container.py", line 92, in forward
  File "site-packages\torch\nn\modules\module.py", line 541, in __call__
  File "site-packages\torch\nn\modules\conv.py", line 345, in forward
  File "site-packages\torch\nn\modules\conv.py", line 342, in conv2d_forward
RuntimeError: CUDA out of memory. Tried to allocate 280.00 MiB (GPU 0; 4.00 GiB total capacity; 2.92 GiB already allocated; 0 bytes free; 35.32 MiB cached)

Ryzen 5 2600

16GB DDR4 Ram 

GTX 1050 ti 4gb vram

Windows 10

Developer

Yep, is a memory problem, try to close any application that are not needed and maybe a smaller resolution, other than that, for now there is no other solution.

(1 edit)

Hi, I want to ask about splitting sections, what do those two arguments mean? Size and padding, and what is the recommended settings for a 16gb Tesla V100 GPU when proccessing a 3GB 56min animation? Thank you.

(2 edits)

Hi nekodaze. The 'Section Size' and 'Section Padding' under the Split Frames section (It is beside but you got the point) is a roundabout way to render frames while using reduced memory size. 

It is not recommended and I would not recommend it AT ALL. It will create artifacts in fast moving objects, smokes, and similar action scenes. It will create boxes artifacts on your entire video, like for my case, when I try to render 1920x1080 video (23 hours render), I use 450 Section Size and 150 Section Padding under the split frames section, it creates 8 equal sections on my video when it is done rendered 
(only seen in high intensity scenes as mentioned before).

If you still don't get it due to my 1st class English, imagine 8 TVs with frames and all, put together to make 1 big screen. You can still see the border even though it is see-thru (like table glass color).

(3 edits)

About the 16gb Tesla, I am currently on GTX 1070 and the maximum size it can handle is 500 Pixel.
I think the VRAM can correlate to how big the frames can be rendered in linear fashion. (Without the split frames section being selected).
I have tested a GTX 1050 with 2gb of VRAM (120-140px), 1050 ti 4gb (240-260px), and GTX 1070 8gb (480-500px).
All are tested personally by me and not taken from other sources.
If we calculate the frames that your gpu is able to make, it will be 2x bigger than mine, about 960-1000px.
I know this sounds such a turn off but it is in Alpha state, and as you can see from the error codes, you can tell it is still not yet optimized properly and needed some more time to mature.
All you can do is wait it out to be better, it is just ridiculous to think you have to spend on a 24gb gpu to render a 1080p native file without the artifacts.

(1 edit)

And as an added bonus, I do not see any difference in rendering speed in different cpus' strength for my case. The GTX 1050 2gb uses an i5-3470, 1050 ti uses an i7-3770, and the gtx 1070 uses a Ryzen 2600. When I use the same datum settings, only 120px on all setups to match the limiting factor of 1050 2gb, all render at almost the same speed (same size,frame rate, and video duration of 30 seconds), all take about 1 hour or so to finish). I have yet to test on i3 or lower tier processor since I do not have any.

I hope this helps answer your question. I am not an expert or anything, just telling from my own collected personal data, settings and experience.

Thank you very much Noraiman, I thought it is splitting video by duration instead of picture, I never thought that a few 1080p frames would take over 16GB VRAM. Looks DAIN has a long way to go.

I've tried many different size values and I think 900px may be the limit for 16gb, very close to ur calculation. If it goes higher rendering process will be unstable, some scenes may take more memory.....Or maybe just my video bit rate is too high, each frame after extraction is about 3MB, I haven't try other video so not sure if this size is normal.

I temporarily gived up proccessing this video, for it will take about 860h to rendering 馃寶

About different CPUs, I noticed CPU usage is low after frames extraction, so yes cpu is  probably have no thing to do with rendering speed. 

No problem. Thank you for your information too.

I have same problem and i have power ful 2080 ti that i got from illumicorp member. so what the heck shall ido to fix that. can someone give me an solution. please answer fast i dint want to wait years for an answer 

Hi daciansolgen3,

Just a few questions first:

1. What resolution is your input media?

2. Are you using the Split Frames option? If so, what are your settings?

3. In the error message, how much VRAM does it say is reserved by other applications? How much VRAM is reserved by PyTorch and how much does it say you have left?


I can't offer much help without answers to the above questions but I'll try and give advice.

If you're running out of VRAM with a 2080Ti then I can assume your input media is larger than 1080p. When trying to interpolate these large frame sizes in DainApp and get an out of memory message, you need to turn on the "Split Frames" option under the "Fix OutOfMemory Options" Tab.

Leave the x=2 y=2 defaults and 150px padding as they are for now and try feeding the frames to Dain. If it starts rendering with no error, I'd close DainApp, then restart it, re-select all my previous options and then reduce either X or Y splits to 1.

The idea is that you want as few splits as possible whilst avoiding the OutOfMemory error.

If you still get an error at X=2 Y=2 then add 1 to either axis and try again. Keep adding 1 to one, then the other until you don't get an OutOfMemory error.

Also, using the experimental Interpolation algorithm uses more VRAM than the Default algorithm so bear that in mind.


Don't run any other GPU based program at the same time as Dain as this will reduce your available VRAM as well as increase your interpolation time. Even if a GPU intensive program has been closed, VRAM may still be reserved for that program, effectively reducing the available VRAM for other programs including Dain.


"i have power ful 2080 ti that i got from illumicorp member"

I had to look up what Illumicorp is but I would try and stick to reputable sources for GPU procurement. There have been many scams over the last couple of years selling GPU's advertised as the latest models that have been lower cost GPU's with cooler shrouds from expensive models and modified BIOS's that mean they report themselves as expensive models. Pretty much the only way you can differentiate these is firstly the idea "If it sounds too good to be true it most likely is." then seller ratings and reviews and finally Benchmarks and/or a tear-down of the physical hardware.

Maybe the first thing I would do would be to Benchmark your GPU using something like Geekbench and compare it to average CUDA Benchmark scores.


Or this post might be an r/whoosh moment