S3 Virtual Files, GetAllMatchingFiles returns empty results

Two Questions regarding the S3VirtualFiles components. I am trying to get all virtual files under a particular folder (which contains subfolders holding the files I want)

  1. When I call s3.GetAllMatchingFiles().ToList() I get back zero results. In fact, it doesn’t seem to matter what I put into the globPattern, I always return zero results. I assume this is not by design?

  2. As a workaround, I am using the s3.GetAllFiles().Where(s => s.VirtualPath.StartsWith()).ToList() which is returning me my results as expected. Because this starts at the root of the bucket, I’m questioning how performant this “workaround” is. I.e. if there are a ton of folders and files in this bucket and I’m just searching for all files under a certain subfolder, will this slow down over time? I.e. is the IEnumerable acting like an IQuerable where the heavy lifting doesn’t get done until I call the ToList()?

UPDATE:
I seem to be getting good results using the IVirtualFolder.GetAllMatchingFiles("*"), so maybe this question is moot.

Yes you can use GetAllMatchingFiles("*") or GetDirectory(dirPath).Files. Each API is implemented as efficiently as allowed with the APIs available in the underlying provider. You can browse the source code to inspect the implementation of the S3 Virtual Files Provider. e.g. for wild card searches the AWS ListObjects API is used which avoids N+1 Web Service calls.

Ok I’ve been playing around with the following code:

var s3 = new S3VirtualFiles(new AmazonS3Client("KEY","SECRET",RegionEndpoint.USEast2), "BUCKET");
var fileName = "14135bb9-b91a-4274-8b17-09b94ad2b441.json";
s3.WriteFile("/rootfolder/subfolder1/subfolder2/subfolder3/"+fileName, "{}");

var dir = s3.GetDirectory("/rootfolder");
var dir2 = s3.GetDirectory("/rootfolder/subfolder1");
var dir3 = s3.GetDirectory("/rootfolder/subfolder1/subfolder2");
var dir4 = s3.GetDirectory("/rootfolder/subfolder1/subfolder2/subfolder3");

dir.GetAllMatchingFiles(fileName).Count().Dump("DIR.GetAllMatchingFiles(ERN)");
dir2.GetAllMatchingFiles(fileName).Count().Dump("DIR2.GetAllMatchingFiles(ERN)");
dir3.GetAllMatchingFiles(fileName).Count().Dump("DIR3.GetAllMatchingFiles(ERN)");
dir4.GetAllMatchingFiles(fileName).Count().Dump("DIR4.GetAllMatchingFiles(ERN)");

My dir3, and dir4 return results, while my dir1, and dir2 do not. So it appears I have to be within two folders of the file I’m looking for, for it to find it, regardless of the max depth being used.

Any ideas?

Virtual Paths shouldn’t have a / prefix, but that shouldn’t matter. Also S3 doesn’t have a similar concept of directories, i.e. it just has flies, so for a directory to exist a file needs to be written to it.

Here’s an example of our existing S3 tests if you can contribute a failing test I can look into it.

I’ve submitted a Pull Request showing the issue where we are unable to go more than two levels away when using GetAllMatchingFiles()

Those 3 assertions work as expected in all Virtual Path Providers.

Hmmm… So the strange part is when I run the code in LinqPad

(s3.GetDirectory("a/b/c").GetAllMatchingFiles("testfile-abc1.txt",1).Count() == 1).Dump();
(s3.GetDirectory("a/b").GetAllMatchingFiles("testfile-abc1.txt",2).Count()==1).Dump();
(s3.GetDirectory("a").GetAllMatchingFiles("testfile-abc1.txt",3).Count()==1).Dump();

I get the following results:

True
True
False

I can confirm I’m using ServiceStack.Aws 5.1.1 in my LinqPad. I’m at a loss for what would cause the TEST to PASS, yet the code to FAIL in LinqPad?

Ok Trying this Pull Request again. Here’s what I observe so far:

(s3.GetDirectory("a/b/c/d/e/f/g").GetAllMatchingFiles("testfile-abcdefg1.txt", 1).Count() == 1).Dump();
(s3.GetDirectory("a/b/c/d/e/f").GetAllMatchingFiles("testfile-abcdefg1.txt", 1).Count() == 1).Dump();
(s3.GetDirectory("a/b/c/d/e").GetAllMatchingFiles("testfile-abcdefg1.txt", 1).Count() == 1).Dump();
(s3.GetDirectory("a/b/c/d").GetAllMatchingFiles("testfile-abcdefg1.txt", 1).Count() == 1).Dump();
(s3.GetDirectory("a/b/c").GetAllMatchingFiles("testfile-abcdefg1.txt", 1).Count() == 1).Dump();
(s3.GetDirectory("a/b").GetAllMatchingFiles("testfile-abcdefg1.txt", 2).Count() == 1).Dump();
(s3.GetDirectory("a").GetAllMatchingFiles("testfile-abcdefg1.txt", 3).Count() == 1).Dump();

Produces in LinqPad:

True
False
False
False
False
False
False

Observations:

  • In LinqPad it fails on the second assertion, yet in the test, it fails on the third assertion.
  • In LinqPad, using a test of just a/b/c I was able to get two levels of truthiness, but now testing a/b/c/d/e/f/g I only get one level of truthiness.
  • In the test, we get one more level of truthiness which made the first pull request pass the test, yet it failed in my LinqPad.
  • In the test, a/b/c gets all three levels of truthiness, but a/b/c/d/e/f/g only gets two levels of truthiness.

UPDATE: So it seems that only this line is required to have the bug surface here on your original test. I think once that is fixed, my added Assertions should all pass as well.

This PR had side-effects that broke other assertions in that test so I moved it into its own test which is now passing from this commit.

This change is available from v5.1.1 that’s now available on MyGet.

Awesome! Thank-you :slight_smile:

This issue is coming back up for me… not sure why.

When I try to query for files, I show no files exists, yet they do and I can see then with another S3 browser, and I can actually get the file that exists, just can’t discover it for some reason.

Any ideas?

From what I can tell S3Vault is your own implementation? So I’m not sure what code is actually being used here for GetAllMatchingFiles.

It’s all Service Stack. I’m merely wrapping it for convenience.

Likely its because of the number of files in that prefix. Single AWS ListObjects commands are limited to 1000 objects returned per call, this method doesn’t crawl the prefix completely. You could override the method to do so if you know this prefix does have a large number of files and you want to list all those files regardless. Depending on how many files, this could likely take a long time to respond.

Alternatively, try to use a narrower prefix.

That has to be it, as I’ve got close to 4.5k files in the bucket, and I can use the raw AWSSDK to pull back the list.


@mythz Can we get a fix for this so that I can just use S3VirtualFiles? Either an overload or something?

As ListObjectsV2 is not available in NET6/.NET Standard builds I’ve had to create *Async overloads on S3VirtualFiles and S3VirtualDirectory concrete classes which instead uses ListObjectsV2Async NextContinuationToken to iterate over the enumerated results. So you’d need to use the new async APIs to traverse the enumerated results.

This change is available from v6.6.1+ that’s now available on MyGet.