.SaveAll() performance

Is there a way to improve the performance of SaveAll? (I am finding that on large data sets (ie 10-100K rows) trying an insert, catching the primary key exception, and issuing an update within the catch is significantly faster than SaveAll, even when most of the operations need to be updates. I have experienced this with both MySQL/MariaDB and SQL Server across various managed cloud database services)

this is the pattern I a have evolved too but it seems silly to do as a workaround, but it is at least 4x faster in my testing across a broad set of updates and insert data sets.

     try{
                        Db.InsertAll(recs);
                    }
                    catch(Exception e)
                    {
                        try{
                            Db.UpdateAll(recs);
                        }
                        catch(Exception e2)
                        {
                            foreach(var rec in recs)
                            {
                                try{
                                    Db.Insert(rec);
                                }
                                catch(Exception e3)
                                {
                                    try{
                                        Db.Update(rec);

                                    }
                                    catch(Exception e4)
                                    {
                                        Log.Error("Error updating a single record",e);
                                    }
                                }
                            }
                        }

                    }

Hi @kev_rm,

Can you produce an example with non-sensitive data you are using so we can have a closer look? From your code provided it looks like you are dealing with data that has some primary key information populated, so the biggest difference I would guess is that fact that the SaveAll method fetches the data using SelectByIds and a SQL IN statement for rows that have their primary key populated.

That additional time could be transferring the large amount of data and/or the query for fetching the data itself not performing well. If you can have a look at the queries being performed as a part of the SaveAll on your database server, you’ll have a better idea at why the performance hit might be so large.

I would need to be able to reproduce it to be sure, but so as far as I can tell the code you provided is skipping a step of fetching data to only process Update calls for data that exists in the database. So it is doing less work where as the SaveAll process checks the database for entries matching the primary key to decide to use Update or Insert method for each item to be saved.

Eg, if the primary key value is missing or doesn’t match with an existing row, it is inserted with a new primary key provided from the database if the [AutoIncrement] attribute is used.

If you are loading large amounts of data, it might be best to look into tools to pre-load the data directly such as the COPY method in Postgres (if possible in your environment) as that would likely be the fasted way.

Thanks for the reply, ya, that explains it because my logic doesn’t return the data back to the client for existing rows ever. The implementation here for SaveAll could definitely be improved. Essentially you just need to know which PKey’s exist not the whole object, at which point you can issue an update, or an insert if not. Seems like a big opportunity. Same goes for .Save I would guess as well. You just don’t need the whole row back to decide if you’re going to do an Insert or an update. I’m sure there will be arguments about well we need to support partial updates yadda yadda yadda but doing this much IO simply to decide if we want to insert or update is pretty… suboptimal.

This is what is already happening, the statement is only returning IDs.

How many rows are you trying to Save? SQL IN isn’t efficient for large datasets.

In a given window a few million records, but generally one batch is 10 to 100K. Be nice if the framework supported upserts where the dbms does this would save a lot of IO’s. pgsql, SQL Server and Mariadb do have this and its blazing fast compare to looking for existing records.

yeah that’s not going to scale, try reducing batch size to 1000 rows

I’m not trying to scale, I’m trying to get it to perform. Reducing batch size makes it worse.

Doesn’t scale as in the large SQL Statement to fetch the Ids gets slower the more rows you’re trying to save at once.

I doubt the examples aren’t equivalent if it’s still performing much worse, are you also executing your custom Insert/Updates in a transaction as well?