MarkLogic- NoSQL before NoSQL was cool

February 25, 2012

In response to NoSQL before NoSQL was cool and the proof


If you ever needed to validate a CALS table using schematron

June 3, 2011

Here is a nifty little schematron rule to validate CALS tables.  In Particular it checks to make sure
that morerows, namest, nameend parameters are properly placed.  I hope this helps someone, it took me some time to hack it together.

<sch:rule id="TABLE_ROWS" context="*:row" xmlns:sch=""><sch:let name="entry-count" value="count(*:entry)"/><sch:let name="tgroup" value="./ancestor::*:tgroup[1]"/><sch:let name="cols-count" value="xs:integer($tgroup/@cols)"/><sch:let name="colspans" value=" sum( for $e in ./*:entry[@namest] return xs:integer(replace($e/@nameend,'[^\d]','')) - xs:integer(replace($e/@namest,'[^\d]','')))"/><sch:let name="morerows" value="(preceding-sibling::row[*:entry/@morerows])[position() eq last()]"/><sch:let name="morerows-value" value="$morerows/*:entry/@morerows[position() eq 1]"/><sch:let name="morerows-position" value="count($morerows/preceding-sibling::*:row)"/><sch:let name="rows-distance" value="count(preceding-sibling::*:row)"/><sch:let name="morerows-distance" value="$rows-distance - $morerows-position"/><sch:let name="morerows-count" value=" if($morerows) then if($morerows-distance le xs:integer($morerows-value)) then count($morerows/*:entry) - 1 else 0 else 0"/><sch:assert id="TABLE_ROW_MATCH_COLS_COUNT" test="$cols-count eq count(*:entry) + $morerows-count + $colspans" flag="ERROR">
The Count of row/*:entry elements must match column specification in tgroup
|$morerows distance: <sch:value-of select="$morerows-distance "/>

|$morerows value: <sch:value-of select=”$morerows-value”/>
|in-distance: <sch:value-of select=”$morerows-distance le xs:integer($morerows-value)”/>
|more-rows-entry-count:<sch:value-of select=”count($morerows/*:entry) – 1″/>]

The Count of row/*:entry elements must match column specification in tgroup
|$morerows distance:
|$morerows value:

I love you, your perfect, Now Change!!!

April 24, 2010

I have been working at Marklogic now for a little short of 2 months and I must say it has been an awesome experience to work in a place where everyone shares your passion for XML development and XQuery.  I can say that I find it hard to write in any other language and have to remind myself to not to start writing let statements $var as xs:string := “xxxx” or beginning a loop as a flwor statement.  With my total love for ML and XQuery, I still find myself defending my its honor to my Java/.NET colleagues. While I may win on many levels with regards to object orientation and how imperative languages are beginning to look alot like functional languages.  And how more natural (yet verbose) XML is to Object Orientation than say the Relational model.  I don’t lose to many arguments on those points.  I think I am one to fight tooth and nail to support XQuery.  I can say that Marklogic, makes building Search and REST services a trivial effort and pretty much anything that is exposed as XML can be easily worked with.   .  Yet I have been thinking for sometime that I would love to see it evolve from XML Search Platform and XML Query Language  into the Rapid Application development Platform for____fill in the blanks_______.  This evolution needs to start with writing the next killer application that can reach the masses of developers who struggle  to develop XML applications, but rely on Java/.NET because of the lack frameworks beyond Parsing, Transforming and Searching XML.

  • We need a Rails type frameworks  for web development.
  • We need a Service Bus framework for Enterprise integration
  • We need an AOP framework that allows loose coupling between our application and business logic.
  • We need the ability to inspect and parse XQuery to support dynamic programming.
  • We need better integration with our imperative brothers and sisters.
  • We need XQuery to Change!!!!

Converting Arabic to Roman Numerals

April 13, 2009

So I spent a little time cracking my head to write a number converter for roman numerals in XQuery.  Most examples from java and such use a while loop to do the conversion.  This is not possible in xquery so to simulate the while loop using recursion the solution is very simple just use a simulated queue that pops off the values while it builds the Roman Numerals.

declare variable

$romanAlpha as xs:string* :=
(“M”, “CM”, “D”, “CD”, “C”, “XC”, “L”, “XL”,“X”,“IX”, “V”, “IV”,“I”);

declare variable $romanNums as xs:integer* :=
converts arabic number to a roman numeral

declare function local:number-to-roman($num as xs:integer){
if($num eq 0) then
else if($num gt 3999) then
fn:error(xs:QName(“INVALID_ARGUMENT”),“Cannot Convert Number Larger than 3999”)
Recursion Method used to calculate the roman numeral

declare function local:recursive-roman(
$num as xs:integer,
$alpha as xs:string,
$sequences as xs:integer*){
let $i := $sequences[1]
let $rom-a := $romanAlpha[fn:index-of($romanNums,$i)]
if(fn:not($sequences) and $num eq 0) then
if($num gt $i) then
local:recursive-roman($num $i, fn:concat($alpha,$rom-a),$sequences)
else if($num lt $i) then
local:recursive-roman($num, $alpha,fn:remove($sequences,1))
else if($num eq $i) then

New Job New Challenges Same old content

March 2, 2009

For a month now I have started working with my new company McGraw Hill Companies and I must say that content is at the forefront of my challenges.  Some of the key thingsI am working on are EPublishing, Content to Layout, and the complexities of being in a larger organization.

Inverse Citation Frequency

December 15, 2008

Working for a legal publisher, we face many challenges related to content relations and keeping content relevant.  In legal content, citations set precedence for legal professionals to further relate cases or understand rulings on cases.  I have been formulating the concept of a probably well known issue, known as “inverse citation frequency”.  The principle follows that of most search engines that use inward links to a document as a mechanism for scoring the relevancy of a document. Given the number of citations found within a document, one would relate these to other documents that share the same citation or group of citations including an element of the sentiment of the cases ruling.  The identification and normalization of citations would drastically improve the cross-linking of news stories, cases to cases, etc. 

The key issue is normalization of the case citiations, while Blue book and Chicago Law,NY Style Manual have style guidelines for formatting citations, there are many permutations of how people express citations.  I have spent many hours handcrafting citation regular expressions and have found it to be a non-trivial exercise.  Sure companies like Lexis and West have mastered this functionality in product lines, but these systems are locked behind their proprietary walls. 

Anybody have any thoughts on this??

Marklogic Wish List

September 17, 2008

After speaking with some very talented people at Marklogic.  I decided to air my wishlist with the community at large and get some feedback. 

Server Functionality:

  • Samba/CIFS Server – The big challenge with integrating marklogic at my company has been getting the content into marklogic.  While often we can write custom loaders or some of the code on, such as xqloader.  We dont have time to prove value without some long drawn out development project.  It would just be as simple to expose marklogic as a CIFS/Samba server rather than a webdav server in its current releases.  First, I will say without a good webdav client or some api that is easy to follow. its useless for large scale loading of content. I think most people when they get started with Marklogic is to just mount a share and start building a directory and dropping in their xml content.  Alfresco has successfully implemented such functionality with JLAN implementation and this made a big difference in me getting alfresco up and running.
  • Versionable File System – Automatic file versioning similiar to Subversion or CVS.  This is purely an nice to have feature.  But often as a developer you dont want to have to figure out how to manage versioning of files(version storage, apis to handle it).  Better if Marklogic provided this out of the box IMHO.  

Language Features

  • True Schema Support functionality – I understand that you want to be able to handle Xml without restrictions, buts lets face it if you could validate a document against a schema in marklogic you could at least have a choice as to whether you want to bomb the file or not. I dont know how many times I have encountered a problem in one of my parsing applications because of some unknown element is introduced into my content that I havent accounted for.  We can ignore things we dont like but it only makes it worse after you have ignored this problem.  If I can know what schema violations a particular document has I can at least move it to QA for fixing. If anybody has implemented a schema validator in marklogic.  Please send me a message. 
  • External Xquery functions – No I dont mean java or dotnet based functions.  I mean xquery functions that can be passed to a module (which I guess would require High Order Functions).  Often when writing a transformer for a particular type of Xml Content such as Docbook or NITF.  You want often end up writing a series of typeswitch statements that iterate over the content.  For every transformation you want to do from the base transformation(ie XSL-FO,HTML) you have to duplicate this functionality.  By allowing passing of external functions you can write modules that accept a series of functions and will call that function or use the local functions.   This would seriously reduce the amount of boilerplate code you would have to write for new transformations. Also the inclusion of checking if a function or variable is undefined would also prevent try catch statements from XDMP:UNDEFINED errors. 
  • XQuery Reflection – XQuery Reflection over modules(I have posted about this earlier tonight:0).  Oh the use cases I could write for this!  I will boil it down to these 3, Documentation stubs(ie XqDoc) , Dynamic Programming, and Code Generation(WSDL’s, Dynamic Eval, Scaffolding Rails???)
  • Xml Diff –  This would be awesome.  So a few flavors would need to be handled order sensitive and insensitive.  Also update functions that accept a diff-gram would save alot of headache in writing update code.

Any thoughts?