I've been messing around with the dynamic typing features in C# 4.0. It took a little while before I actually came up with a good use for them, but I finally found one.
I had a situation that seemed like it should have a small custom class with two properties: A string and an int, representing the number of hits of that string within a List<string>.
That seemed like overkill. Best practices tell me this puny little class would expect its own .cs file, and I just wasn't ready to write a whole .cs file for two variables. I may be leaning toward the whole Enterprise Architecture mentality, but not that much :)
I tried a Dictionary<string,int>, but that didn't work as it doesn't provide an iterator.
So I have a problem whereas I want to create a complex duplicate filter based on a List<string>. Well, not really a duplicate filter so much as a "nearly duplicate" filter. That is, I want to remove any instances from the list if they contain a word that occurs in multiple items in the List, except short and common words. Oh sure, it sounds simple. But I wanted to do it a new way.
At first I thought I might be able to use anonymous types and stick them into a list of their own type.
List <(new { Word = "", Instances = 0 }).GetType()> words2 =
new List<(new { Word = "", Instances = 0 }).GetType()>();
Obviously, that didn't work. I got tons of compiler errors like Using the generic type 'System.Collections.Generic.List<T>' requires 1 type arguments and Argument 1: Cannot convert from 'AnonymousType#1' to 'int'. But I was passing in a Type... bah, that's not going to work.
My figuring is that I just ran into a contravariance issue whereas the compiler tried to find the most suitable overload for the List<type> constructor and thought List<int> was the closest match, possibly because my anonymous type had an int property and the int constructor comes before the string constructor in the framework assembly.
What I needed was a way to force it to use the List<object> constructor without having to box the variable. Why avoid boxing? While object is the only type you can cast an anonymous type to, for some reason they don't like being cast back from objects into anonymous types, so unboxing would have been impossible.
So then I remembered I was working in .NET 4.0, and thus I could use the new dynamic keyword as a type. That made it simpler. I'd still use anonymous types, but the anonymous type itself would be the dynamic type in my Generic. Dynamic directly extends object, so the compiler should see that constructor overload first and go directly there based on base class inferrence.
My brain goes into recursion just thinking about how this resolves at runtime :)
private static List<string> ComplexDuplicateFilter(List<string> input)
{
if (input == null || input.Count == 0)
return input;
input = input.Distinct().ToList();
// Build a list of words and how many times they occur in the overall list
List<dynamic> words2 = new List<dynamic>();
foreach (string line in input)
foreach (string word in line.Split(new char[] { ' ', ',', ';', '\\', '/', ':', '\r', '\n' }))
{
if (string.IsNullOrEmpty(word))
continue;
else if (ExistsInList(word, words2))
foreach (dynamic dyn in words2)
{
if (dyn.Word == word)
dyn.Instances++;
}
else words2.Add(new { Word = word, Instances = 0 });
}
// Remove extraneous entries for common word permutations
List<string> output = new List<string>();
foreach (dynamic dyn in words2)
if ((dyn.Iterations > 5 && !output.Contains(dyn.Word)) ||
dyn.Iterations <= 5)
output.Add(dyn.Word);
return output;
}
private static bool ExistsInList(string word, List<dynamic> words)
{
foreach (dynamic dyn in words)
if (dyn.Word == word)
return true;
return false;
}
Of course then I realized a couple of problems:
- When you declare an anonymous type variable, its properties are read-only for some reason. Rather than changing their properties, one must instead overwrite them with a new anonymous type of the same signature (but with a different value per requirements).
- CA1502: Method has a cyclomatic complexity of 33. Microsoft recommends it be <= 25. That is, the method has 33 different execution paths. Spiffy, I never knew that.
- This method will be used to suggest SearchStatuses (i.e. tracked Google queries and commensurate ranks) in the Daily Report I'll get from META. Search engine queries tend to have quotes. D'oh! Easy fix, added another char to the string.Split argument.
- A couple minor changes here and there... Obviously Instances would start at 1 instead of 0, etc.
I'm pretty satisfied with the final result:
private static List<string> FilterByDuplicateWords(List<string> input)
{
if (input == null || input.Count == 0)
return input;
input = input.Distinct().ToList();
// Build a list of words and how many times they occur in the overall list
List<dynamic> words2 = new List<object>();
foreach (string line in input)
foreach (string word in line.Split(new char[] { ' ', ',', ';', '\\', '/', ':', '\"', '\r', '\n', '.' }))
{
if (string.IsNullOrEmpty(word))
continue;
else if (ExistsInList(word, words2))
for (int i=words2.Count - 1; i >= 0; i--)
{
if (words2[i].Word == word)
words2[i] = new { Word = words2[i].Word, Instances = words2[i].Instances + 1};
}
else words2.Add(new { Word = word, Instances = 1 });
}
// Remove extraneous entries for common word permutations
List<string> output = new List<string>();
foreach (string line in input)
{
int Dupes = 0;
foreach (string word in line.Split(new char[] { ' ', ',', ';', '\\', '/', ':', '\"', '\r', '\n', '.' })
.Where(p => p.Length > 7)
.Distinct())
{
int Instances = 0;
foreach (dynamic dyn in words2)
if (word == dyn.Word)
{
Instances = dyn.Instances;
if (Instances > 1)
Dupes++;
break;
}
}
if (Dupes == 0)
output.Add(line);
}
return output;
}
private static bool ExistsInList(string word, List<dynamic> words)
{
foreach (dynamic dyn in words)
if (dyn.Word == word)
return true;
return false;
}
So this solved the items on my issue list (bulleted above), and even resolved the cyclomatic complexity issue by accident.
Notice I'm also becoming more comfortable with lambda expressions and functions. They're big and scary before you figure them out, but once you do, you wonder how you ever lived without them.
Now I just need to find a good use for yield return and I'll be set.