Number of requests per second in .NET Core seems really low compared to earlier reports

daleholborow · November 17, 2017, 12:18pm

I’ve looked through other forum comments where SS users have done load testing and report that they can achieve thousands (e.g. ~20,000+) requests per second on things like a simple “hello world” REST request. Similarly, googling around I see comments like "node can handle 10,000 req /sec"etc etc. (see https://raygun.com/blog/increased-throughput-net-core/ )

Pardon my naivety, but as a simple investigation, having recently migrated to the .NET Core libraries and spinning up a new project, I run a self-hosted test case where i run a simple FOR loop through 5 rounds of 1000 registrations of a new user, saving their auth details into an MS SQL db with the ormlite auth repositories, and time how long it takes to insert 1000 new user accounts. On average, I’m seeing 1000 take 23-27 seconds, or around 35-45 new registrations per second… wildly wildly lower than the figures reported. These are, I realise, running in serial - i will continue trying a parallel implementation out of interest as well.

In the meantime, switching to a totally basic, unauthed endpoint that simple creates a domain model (Venue, a basically empty class) and does a ConvertTo (also basically an id+name pair for now), with no database access whatsoever, I see the requests average about 1000 every 8 seconds, or 100-125 per second… much better, but still nothing remotely like what I was expecting?

I’m running that on a MS Surface Book laptop with 8GB RAM which is a reasonably spec’d machine, if not exactly a super-server. It also seems to make no real difference running the solution in DEBUG vs RELEASE mode.

Am I am idiot for having expected the number of requests to be much higher? Am i misunderstanding or misinterpreting the earlier reported figures somehow? I’ve pushed internally to use SS for this greenfields project that will be relatively low load generally (a few users per minute looking through social events, for example) and the occasional high spike of traffic (during ticket sales openings), but these figures worry me a bit and I would like to understand what sort of “real world” performance I can expect so I can discuss with the product owners?

Or should I be diving into async endpoints and making use of that? Please pardon my ignorance, I’ve rarely had to worry much about performance management at a high scale before.

mythz · November 17, 2017, 12:41pm

You can’t compare hello world benchmarks to real-world requests with I/O, especially Auth/Registration which requires several I/O hits to set up an Authenticated UserSession, in v5 we’ve also moved to using a computationally stronger hashing algorithm which isn’t representative of any other Request Type. It’s more meaningful to benchmark and already authenticated user to measure what load a typical authenticated request is like. Serial benchmarking results are also meaningless and to load test HTTP requests you should be using a proper benchmarking tool like wrk or Apache Bench on a separate machine which uses an efficient client and can better simulate real-world concurrent requests.

As soon as you introduce I/O you’re no longer benchmarking just the framework which is now dependent on network / db load/performance, they’re a different class of request that’s not directly comparable to a hello world in memory benchmark. Profiling and load testing are still good to find inefficiencies and hot spots, which will help identify requests that can benefit from caching.

daleholborow · November 17, 2017, 12:52pm

I appreciate your comment about the “serial benchmarks being useless”, I just attempted a parallel linq request version for the naive no-db-access GetVenue endpoint and it was much much quicker (200req/s, but still far far from the 20K mark mentioned previously).

I guess that I wasn’t specifically trying to “test/cross-examine” ServiceStack, but I am interested in what real world performance I can get when SS+DB+Logic are involved, and I figured that using a self hosted application without any network latency etc would be about as ideal a situation as a test could get… and any performance would go downhill from there when deployed to a PROD server where DB and IIS are separated?

Specifically, I can’t afford to have this application blow up when I have several hundred / thousand users trying to hit the APIs at critical peak times, so am planning ahead and trying to get an idea of how to be comfortable that the framework and my architecture are suitable, and that first experiment gave me pause.

I am yet to upgrade to v5, will do so soon.

daleholborow · November 17, 2017, 1:05pm

FYI to anyone following, I just spun out my (already notedly naive) test into a parallel implementation, (with all requests in Sync not Async) and I’m able to query use the auth functionality to register 1000 new users every 8 seconds on avg (~125/sec), and the no-db-access query of a simple dto is hitting the 200req/sec mark … so I feel a bit better now

mythz · November 17, 2017, 1:17pm

Splitting DB’s from the App Server yields different results as they’ll no longer share CPU resources to handle requests, DB tuning also has an effect, most DB Servers installed locally are in development mode and not tuned for production workloads. The server hardware and whether you’re using a Server OS matters, the speed of the disks, etc. Then there’s the query itself, what’s the query plan like is the query hitting an index, number of rows, etc.

Best way to measure real-world performance is to run a load test on your production configuration, you’ll also want to use a proper tool like wrk or ab where you can alter the concurrency.

daleholborow · November 17, 2017, 1:21pm

I had considered the competition for CPU time, 'but not the db-in-dev mode - thanks. I’m familiar with db admin in at best a, shall we say, “moderately experienced but still high level” way, so long run we’d hand that job to a dedicated db guy but for the moment, its mostly in the hands of our .NET devs, and so I’m just using whatever tricks I know of to keep things decent in the meantime.

So for now, I’m feeling much more comfortable, thanks. Out of interest, does an upgrade to v5 likely improve performance, or much of a muchness?

mythz · November 17, 2017, 1:28pm

Using .NET Core 2.0 should improve results. V5 should have marginal difference either way, we’ve added async equivalents to all sync filter types and switched to using async/await which adds a little overhead. We’ve switched to using Async APIs for writing to the response internally and removed PCL support removed some abstraction penalty. Now that most of the structural changes we wanted to do for v5 has been done we can do a round of profiling/optimization’s which @xplicit has started looking into now but hasn’t committed anything in master yet.

mythz · November 18, 2017, 3:37pm

FYI I’ve been able to start looking into this and just found and resolved a major perf issue by delaying disk access until it’s needed which had a significant effect on RPS so the latest v5 should be much quicker than the current v4.5.14 release.

daleholborow · November 20, 2017, 9:46am

This morning I upgraded from the .core libraries (1.0.44) to the newer myget v5 versions, and oddly I am actually seeing decreased performance when comparing the two.

I switched back to the previous commit (with 1.0.44) to confirm, and then forward to the v5 commit again, and over the course of several runs, I’m definitely seeing slower results (7-10 seconds for the 1000 ‘dumb’ endpoint hits and 23-28 seconds for the registration of 1000 new users).

I know you said that the naive approach isnt particularly valid for “real” testing, but i’m happy to send you an invite to my private bitbucket if you’re interested in validating the results. I have a db.deploy project that deploys my tables to an MSSQL instance (using some wrappers around DbUp) and then a “test” project that fires off the test queries, and not much else in the way of business logic at the moment.

mythz · November 20, 2017, 9:57am

As mentioned previously user registration/authentication requires several I/O calls to setup an Authenticated UserSession and isn’t something that’s going to have high-performance and with v5 we’ve moved to using a computationally stronger password hashing algorithm. Load testing an authenticated request is fine, but the one-time authentication/registration per user isn’t going to be representative of overall performance.

Can you provide the before/after numbers for your other normal requests. Please use wrk instead of your own client. On Windows you can get it by enabling Windows Subsystem for Linux (WSL) then you can run it against an endpoint with:

$ wrk -c 256 -t 8 -d 10 http://localhost:3000/hello?format=json

Which will load test the /hello JSON API for 10 seconds using 8 threads with 256 concurrent requests. Can you provide the output for 1.0.44 and 5.0.0 so we can see the difference?

daleholborow · November 20, 2017, 11:02am

I set up ubuntu for bash and run it as requested. I notice that the latency seems really high??

For v.1.0.44 connecting to my GetVendors endpoint (which does not hit db or load any real data, it simply returns a result from an optional like:

var vendorDto = (new VendorModel()).ConvertTo<VendorDTO>();

For 1.0.44 running in VS2017 in Release mode, I see (over 3 runs):

dale@MININT-KERTCMM:/mnt/c/Windows/System32$ wrk -c 256 -t 8 -d 10 http://localhost:53613/venues?format=json
Running 10s test @ http://localhost:53613/venues?format=json
8 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 242.00ms 110.94ms 1.08s 72.55%
Req/Sec 12.97 8.34 50.00 78.68%
310 requests in 10.05s, 194.05KB read
Socket errors: connect 0, read 0, write 0, timeout 4
Requests/sec: 30.84
Transfer/sec: 19.30KB
dale@MININT-KERTCMM:/mnt/c/Windows/System32$ wrk -c 256 -t 8 -d 10 http://localhost:53613/venues?format=json
Running 10s test @ http://localhost:53613/venues?format=json
8 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 315.26ms 112.39ms 767.03ms 70.16%
Req/Sec 5.76 2.94 20.00 57.55%
315 requests in 10.11s, 197.18KB read
Requests/sec: 31.17
Transfer/sec: 19.51KB
dale@MININT-KERTCMM:/mnt/c/Windows/System32$ wrk -c 256 -t 8 -d 10 http://localhost:53613/venues?format=json
Running 10s test @ http://localhost:53613/venues?format=json
8 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 244.18ms 143.77ms 1.47s 75.25%
Req/Sec 16.22 10.10 60.00 79.88%
301 requests in 10.09s, 188.42KB read
Socket errors: connect 0, read 0, write 0, timeout 6
Requests/sec: 29.83
Transfer/sec: 18.67KB
dale@MININT-KERTCMM:/mnt/c/Windows/System32$

Now, running the same thing using the v5, I get the following:

dale@MININT-KERTCMM:/mnt/c/Windows/System32$ wrk -c 256 -t 8 -d 10 http://localhost:53613/venues?format=json
Running 10s test @ http://localhost:53613/venues?format=json
8 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 304.37ms 151.64ms 1.71s 77.14%
Req/Sec 8.64 5.70 30.00 83.18%
283 requests in 10.07s, 176.88KB read
Socket errors: connect 0, read 0, write 0, timeout 3
Requests/sec: 28.10
Transfer/sec: 17.56KB
dale@MININT-KERTCMM:/mnt/c/Windows/System32$ wrk -c 256 -t 8 -d 10 http://localhost:53613/venues?format=json
Running 10s test @ http://localhost:53613/venues?format=json
8 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 302.10ms 101.13ms 739.08ms 71.47%
Req/Sec 12.18 8.51 40.00 86.34%
326 requests in 10.09s, 203.75KB read
Requests/sec: 32.29
Transfer/sec: 20.18KB
dale@MININT-KERTCMM:/mnt/c/Windows/System32$ wrk -c 256 -t 8 -d 10 http://localhost:53613/venues?format=json
Running 10s test @ http://localhost:53613/venues?format=json
8 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 292.85ms 114.27ms 689.65ms 71.47%
Req/Sec 11.45 9.75 50.00 87.19%
320 requests in 10.08s, 200.00KB read
Socket errors: connect 0, read 0, write 0, timeout 1
Requests/sec: 31.73
Transfer/sec: 19.83KB

So this looks like both are almost exactly the same, and I have something weird about my localhost that is giving me really crap latency?? I am going to load one of the servicestack .core demo projects and see what results I get from that.

I will report back

mythz · November 20, 2017, 11:07am

What results are you getting for a simple service with no I/O requests?

[Route("/hello")]
public class Hello : IReturn<HelloResponse> {}
public class HelloResponse 
{
    public string Result { get; set; }
}

public class MyServices : Service
{
    public object Any(Hello request) => new HelloResponse { Result = "Hello, World!" };
}

$  wrk -c 256 -t 8 -d 10 http://localhost:53613/hello?format=json

daleholborow · November 20, 2017, 11:10am

Yeah sorry, I was just getting to that.

Using the MVC sample project and hitting the /Hello endpoint I am getting similarly high latency (running on IIS Express):

dale@MININT-KERTCMM:/mnt/c/Windows/System32$ wrk -c 256 -t 8 -d 10 http://localhost:11001/hello?format=json
Running 10s test @ http://localhost:11001/hello?format=json
8 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 156.33ms 266.27ms 1.96s 95.70%
Req/Sec 25.39 15.77 70.00 69.06%
501 requests in 10.09s, 262.73KB read
Socket errors: connect 0, read 0, write 0, timeout 82
Requests/sec: 49.64
Transfer/sec: 26.03KB
dale@MININT-KERTCMM:/mnt/c/Windows/System32$ wrk -c 256 -t 8 -d 10 http://localhost:11001/hello?format=json
Running 10s test @ http://localhost:11001/hello?format=json
8 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 252.58ms 120.94ms 1.69s 94.12%
Req/Sec 8.09 5.99 67.00 87.06%
462 requests in 10.08s, 242.28KB read
Socket errors: connect 0, read 0, write 0, timeout 3
Requests/sec: 45.85
Transfer/sec: 24.04KB

daleholborow · November 20, 2017, 11:14am

and give me a couple minutes and will put that same service in my own code with the v5 libs… one moment…

Running that simple Hello service in my local solution with v5 libs, i am seeing similarly ridiculous latency, and low req/second. I have no idea why.

dale@MININT-KERTCMM:/mnt/c/Windows/System32$ wrk -c 256 -t 8 -d 10 http://localhost:53613/hello?format=json
Running 10s test @ http://localhost:53613/hello?format=json
8 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 529.65ms 302.21ms 1.94s 90.30%
Req/Sec 9.91 8.49 69.00 76.59%
304 requests in 10.06s, 180.80KB read
Socket errors: connect 0, read 0, write 0, timeout 5
Requests/sec: 30.21
Transfer/sec: 17.97KB
dale@MININT-KERTCMM:/mnt/c/Windows/System32$ wrk -c 256 -t 8 -d 10 http://localhost:53613/hello?format=json
Running 10s test @ http://localhost:53613/hello?format=json
8 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 455.28ms 183.92ms 1.68s 87.42%
Req/Sec 9.37 5.57 30.00 73.97%
323 requests in 10.06s, 192.10KB read
Socket errors: connect 0, read 0, write 0, timeout 5
Requests/sec: 32.12
Transfer/sec: 19.10KB
dale@MININT-KERTCMM:/mnt/c/Windows/System32$ wrk -c 256 -t 8 -d 10 http://localhost:53613/hello?format=json
Running 10s test @ http://localhost:53613/hello?format=json
8 threads and 256 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 215.06ms 108.19ms 1.16s 81.49%
Req/Sec 12.93 8.80 49.00 73.63%
311 requests in 10.06s, 184.96KB read
Socket errors: connect 0, read 0, write 0, timeout 3
Requests/sec: 30.90
Transfer/sec: 18.38KB

mythz · November 20, 2017, 11:40am

Ok something is definitely wrong with your environment, I’m running the same HelloWorld Service and on my iMac getting:

mythz@DESKTOP-BCS76J0:/mnt/c/src/SSvsWebApi/src$ wrk -c 256 -t 8 -d 10 http://localhost:3000/hello?format=json
Running 10s test @ http://localhost:3000/hello?format=json
  8 threads and 256 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     5.28ms    3.05ms  47.48ms   78.53%
    Req/Sec     5.15k     1.19k   12.87k    72.74%
  410115 requests in 10.10s, 97.39MB read
Requests/sec:  40594.91
Transfer/sec:      9.64MB

Not sure why it’s so slow, the only time I’ve seen serious perf slowdown like this is when the folder had invalid permissions and I was using an SQLite file database.

You’ll also want to disable logging:

public static IWebHost BuildWebHost(string[] args) =>
    WebHost.CreateDefaultBuilder(args)
        .ConfigureLogging((context, logging) => logging.ClearProviders())
        .UseStartup<Startup>()
        .Build();

and run in Release mode:

$ dotnet run -c Release

daleholborow · November 20, 2017, 12:33pm

Thanks for your time (and for showing me the wrk tool) - I will have to go and investigate the root cause further.

Number of requests per second in .NET Core seems *really* low compared to earlier reports

Number of requests per second in .NET Core seems really low compared to earlier reports