Inverse Citation Frequency

Working for a legal publisher, we face many challenges related to content relations and keeping content relevant.  In legal content, citations set precedence for legal professionals to further relate cases or understand rulings on cases.  I have been formulating the concept of a probably well known issue, known as “inverse citation frequency”.  The principle follows that of most search engines that use inward links to a document as a mechanism for scoring the relevancy of a document. Given the number of citations found within a document, one would relate these to other documents that share the same citation or group of citations including an element of the sentiment of the cases ruling.  The identification and normalization of citations would drastically improve the cross-linking of news stories, cases to cases, etc. 

The key issue is normalization of the case citiations, while Blue book and Chicago Law,NY Style Manual have style guidelines for formatting citations, there are many permutations of how people express citations.  I have spent many hours handcrafting citation regular expressions and have found it to be a non-trivial exercise.  Sure companies like Lexis and West have mastered this functionality in product lines, but these systems are locked behind their proprietary walls. 

Anybody have any thoughts on this??


Leave a Reply

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: