Two Questions regarding the S3VirtualFiles components. I am trying to get all virtual files under a particular folder (which contains subfolders holding the files I want)
When I call s3.GetAllMatchingFiles().ToList() I get back zero results. In fact, it doesn’t seem to matter what I put into the globPattern, I always return zero results. I assume this is not by design?
As a workaround, I am using the s3.GetAllFiles().Where(s => s.VirtualPath.StartsWith()).ToList() which is returning me my results as expected. Because this starts at the root of the bucket, I’m questioning how performant this “workaround” is. I.e. if there are a ton of folders and files in this bucket and I’m just searching for all files under a certain subfolder, will this slow down over time? I.e. is the IEnumerable acting like an IQuerable where the heavy lifting doesn’t get done until I call the ToList()?
UPDATE:
I seem to be getting good results using the IVirtualFolder.GetAllMatchingFiles("*"), so maybe this question is moot.
Yes you can use GetAllMatchingFiles("*") or GetDirectory(dirPath).Files. Each API is implemented as efficiently as allowed with the APIs available in the underlying provider. You can browse the source code to inspect the implementation of the S3 Virtual Files Provider. e.g. for wild card searches the AWS ListObjects API is used which avoids N+1 Web Service calls.
Ok I’ve been playing around with the following code:
var s3 = new S3VirtualFiles(new AmazonS3Client("KEY","SECRET",RegionEndpoint.USEast2), "BUCKET");
var fileName = "14135bb9-b91a-4274-8b17-09b94ad2b441.json";
s3.WriteFile("/rootfolder/subfolder1/subfolder2/subfolder3/"+fileName, "{}");
var dir = s3.GetDirectory("/rootfolder");
var dir2 = s3.GetDirectory("/rootfolder/subfolder1");
var dir3 = s3.GetDirectory("/rootfolder/subfolder1/subfolder2");
var dir4 = s3.GetDirectory("/rootfolder/subfolder1/subfolder2/subfolder3");
dir.GetAllMatchingFiles(fileName).Count().Dump("DIR.GetAllMatchingFiles(ERN)");
dir2.GetAllMatchingFiles(fileName).Count().Dump("DIR2.GetAllMatchingFiles(ERN)");
dir3.GetAllMatchingFiles(fileName).Count().Dump("DIR3.GetAllMatchingFiles(ERN)");
dir4.GetAllMatchingFiles(fileName).Count().Dump("DIR4.GetAllMatchingFiles(ERN)");
My dir3, and dir4 return results, while my dir1, and dir2 do not. So it appears I have to be within two folders of the file I’m looking for, for it to find it, regardless of the max depth being used.
Virtual Paths shouldn’t have a / prefix, but that shouldn’t matter. Also S3 doesn’t have a similar concept of directories, i.e. it just has flies, so for a directory to exist a file needs to be written to it.
Here’s an example of our existing S3 tests if you can contribute a failing test I can look into it.
In LinqPad it fails on the second assertion, yet in the test, it fails on the third assertion.
In LinqPad, using a test of just a/b/c I was able to get two levels of truthiness, but now testing a/b/c/d/e/f/g I only get one level of truthiness.
In the test, we get one more level of truthiness which made the first pull request pass the test, yet it failed in my LinqPad.
In the test, a/b/c gets all three levels of truthiness, but a/b/c/d/e/f/g only gets two levels of truthiness.
UPDATE: So it seems that only this line is required to have the bug surface here on your original test. I think once that is fixed, my added Assertions should all pass as well.
This issue is coming back up for me… not sure why.
When I try to query for files, I show no files exists, yet they do and I can see then with another S3 browser, and I can actually get the file that exists, just can’t discover it for some reason.
Likely its because of the number of files in that prefix. Single AWS ListObjects commands are limited to 1000 objects returned per call, this method doesn’t crawl the prefix completely. You could override the method to do so if you know this prefix does have a large number of files and you want to list all those files regardless. Depending on how many files, this could likely take a long time to respond.
As ListObjectsV2 is not available in NET6/.NET Standard builds I’ve had to create *Async overloads on S3VirtualFiles and S3VirtualDirectory concrete classes which instead uses ListObjectsV2Async NextContinuationToken to iterate over the enumerated results. So you’d need to use the new async APIs to traverse the enumerated results.