Page 1 of 4 1234 LastLast
Results 1 to 10 of 33

Thread: RC5 CUDA Beta3

  1. #1
    Join Date
    Apr 2005
    Location
    US
    Posts
    2,229

    RC5 CUDA Beta3

    The new 509beta3 CUDA client is out for linux, linux-64 and windows. Be SURE that you run the -bench to check which core to run. The default core is not the fastest, at least it wasn't in the case of my GTX260. I haven't tried the others yet.
    Last edited by AMDave; 01-26-2009 at 09:32 AM. Reason: CUCA -> CUDA

  2. #2
    Join Date
    Jan 2007
    Location
    Vermont, USA
    Posts
    1,379
    Doh..!!

    Post your -bench or best core rate here...


    dnetc v2.9103-509-CTL-09010508-*dev* for Win32 (WindowsNT 5.2).

    8800GT

    RC5-72: using core #0 (CUDA 1-pipe 64-thd).
    RC5-72: Benchmark for core #0 (CUDA 1-pipe 64-thd) 0.00:00:14.34 [303,924,061 keys/sec]
    RC5-72: using core #1 (CUDA 1-pipe 128-thd).
    RC5-72: Benchmark for core #1 (CUDA 1-pipe 128-thd) 0.00:00:14.28 [303,004,825 keys/sec]
    RC5-72: using core #2 (CUDA 1-pipe 256-thd).
    RC5-72: Benchmark for core #2 (CUDA 1-pipe 256-thd) 0.00:00:14.57 [296,497,675 keys/sec]
    RC5-72: using core #3 (CUDA 2-pipe 64-thd).
    RC5-72: Benchmark for core #3 (CUDA 2-pipe 64-thd) 0.00:00:14.07 [306,849,590 keys/sec]
    RC5-72: using core #4 (CUDA 2-pipe 128-thd).
    RC5-72: Benchmark for core #4 (CUDA 2-pipe 128-thd) 0.00:00:16.76 [256,428,015 keys/sec]
    RC5-72: using core #6 (CUDA 4-pipe 64-thd).
    RC5-72: Benchmark for core #6 (CUDA 4-pipe 64-thd) 0.00:00:14.00 [308,648,797 keys/sec]
    RC5-72: using core #7 (CUDA 4-pipe 128-thd).
    RC5-72: Benchmark for core #7 (CUDA 4-pipe 128-thd) 0.00:00:16.31 [263,862,250 keys/sec]
    RC5-72: using core #9 (CUDA 1-pipe 64-thd busy wait).
    RC5-72: Benchmark for core #9 (CUDA 1-pipe 64-thd bus ... 0.00:00:14.32 [301,958,331 keys/sec]
    RC5-72: using core #10 (CUDA 1-pipe 64-thd sleep 100us).
    RC5-72: Benchmark for core #10 (CUDA 1-pipe 64-thd sl ... 0.00:00:16.07 [268,257,385 keys/sec]
    RC5-72: using core #11 (CUDA 1-pipe 64-thd sleep dyna ...
    RC5-72: Benchmark for core #11 (CUDA 1-pipe 64-thd sl ... 0.00:00:16.04 [270,003,024 keys/sec]
    RC5-72 benchmark summary :
    Default core : #-1 (undefined)
    Fastest core : #6 (CUDA 4-pipe 64-thd)
    Last edited by Bender10; 01-26-2009 at 12:53 PM.
    Logic is the art of being wrong with confidence.


  3. #3
    Join Date
    Jul 2003
    Location
    Sydney, Australia
    Posts
    5,642
    I'm trying the Win32 CUDA beta client on a Win64 Q9550 and it seems to be extremely slow.

    117Mkeys/sec, 9800GTX, core 3

    Box is also running 2x Wieferich, 1x NPLB and 1x BOINC PrimeGrid 321 project.


  4. #4
    Join Date
    Jan 2007
    Location
    Vermont, USA
    Posts
    1,379
    Vaughan,

    Did you do a -bench and manually select the fastest core?

    Is the 117 Mkeys, your 'time per completed unit', or Average key rate?
    Logic is the art of being wrong with confidence.


  5. #5
    Join Date
    Apr 2005
    Location
    US
    Posts
    2,229
    I'm no mathematics/programming wundergeek, but offhand I would have to say that I'm not surprised that the win32 client is slower as it would seem to me that a 64 bit O/S could pass data twice as fast to the gpu as a 32 bit O/S. In actuality it may not be that restricting, but there has to be an impact on the ability to feed the gpu. ????

  6. #6
    Join Date
    Jun 2007
    Location
    Mid-Michigan
    Posts
    756
    not so sure about that, I have two 9800 GTX+ GPUs, one running on a 5600+ 2.8 Ghz vista 64bit, the other on a 5400+ 2.8Ghz XP 32bit, and they run nearly identical rates.


  7. #7
    NeoGen's Avatar
    NeoGen is offline AMD Users Alchemist Moderator
    Site Admin
    Join Date
    Oct 2003
    Location
    North Little Rock, AR (USA)
    Posts
    8,451
    Quote Originally Posted by Brucifer View Post
    I'm no mathematics/programming wundergeek, but offhand I would have to say that I'm not surprised that the win32 client is slower as it would seem to me that a 64 bit O/S could pass data twice as fast to the gpu as a 32 bit O/S. In actuality it may not be that restricting, but there has to be an impact on the ability to feed the gpu. ????
    The eternal 32/64 bit speed myth...
    The compact version --> there is little or no speed difference in data transfer rate between 32 and 64 bit OSes.

    Now the long version...
    The main 32/64 bit difference in OSes is the address range for memory. In 32bit OSes the maximum memory one can have is 4Gb (without paging or other tweaks).
    This is rather easy to explain, a 32bit number is a number in binary form with 32 digits, and the highest value one can write in binary form with 32 digits is 4,294,967,296 (converted back to decimal of course). You can verify this by doing in a calculator 2^32. That is the maximum number of bytes you can count with a 32bit long number.
    If you take that big number and divide it three times by 1024 (to get kilobytes, megabytes, and gigabytes) the end result is "4" (Gigabytes).
    Each cell in RAM memory has to have an address to be usable, and the address in this case is a 32bit number, so the maximum addresses you can have in 32bit OSes is up to 4Gbytes. RAM Memory beyond that would not be able to be addressable, and thus not usable. (Nowadays RAM beyond 4Gb is usable, using certain tricks that were developed along the years.)
    This works exactly like if you have a long street with plenty of houses but with no door numbers beyond 30. (Rest of the houses have no number) You can't deliver something addressed to number 40 of that street if houses are not numbered up to that. Remember that for machines, guessing is not an option.


    In 64 bits all this changes as now the maximum addressable memory is a huge number that you can see if you do 2^64.

    The funny trick we all wish they would use to double the data rate would be to pass two 32bit numbers together in one 64bit value to the GPU, right?
    Unfortunately that could become very complicated software wise, if not impossible due to OS restrictions. But even if it was possible we would hit the 4Gb barrier again somewhere...

  8. #8
    Join Date
    Nov 2005
    Location
    Central Pennsylvania
    Posts
    4,333
    Bravo What an Explanation there NeoGen !





    Challenge me, or correct me, but don't ask me to die quietly.

    …Pursuit is always hard, capturing is really not the focus, it’s the hunt ...

  9. #9
    Join Date
    Jan 2007
    Location
    Vermont, USA
    Posts
    1,379
    Vaughan,

    Which video driver are you using? You may have to go back 1 or 2 versions....That may work. I'm not sure.
    Logic is the art of being wrong with confidence.


  10. #10
    AMDave's Avatar
    AMDave is offline Seeker of the exit clause Moderator
    Site Admin
    Join Date
    Jun 2004
    Location
    Deep in a while loop
    Posts
    9,608
    strange and wierd.
    my client 'appears' to be working
    BUT all my results seem to have gone ito a void.
    The stats site show that I have returned nothing.
    I killed the client remotely until I can take a closer look at what exactly is going on after work.
    . . . . . ___
    . . . . . . .\___/\______
    . . . . . . . \__AMD___\\__
    ---------------------------------------------

Page 1 of 4 1234 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •