At a small gathering of Erlang afficionados on 22. October 2005 in Edinburgh, Scotland, I was talking a little bit to Phil Wadler about functional programming. He asked if I could send him a little something that explains how I use FP in my day to day work. This is what I sent him: One of my recent projects involved reading information provided by Cisco routers and massaging the data to fit into the database schema used by the company I work for. The Cisco router information came as one large XML file, in which were recorded the various components (routers, switches, hubs, ...), their IP addresses, memory components, and the like. In order to process this information with already existing proprietary tools, I had to produce a series of flat files, one per device to be imported into the company's database. I initially used XSLT to find the various XML nodes that contained the necessary information. I found that it was not trivial to produce multiple output files for the one XML input file though. So, I decided to use some other form of XML to text file munging. Python was suggested to me, and since I hadn't yet done very much with Python, I decided this would be a good project on which to learn. Python provides a rudimentary XML DOM style parser, but not any easy abstract traversal functions. So I ended up writing some myself. Since the XML file I was dealing with had a fixed schema (although it didn't actually provide any schema documentation, never mind any kind of formal schema definition) I made a small interface to traverse nodes and return values in a very fixed way. Taking the lead of the simplest XPath expressions of only choosing certain nodes that lie beneath a particular path (plain "A/B/C" style XPath expressions), I chose to represent the components of the path as a list of names. So, starting at a particular DOM node N, I retrieve all the child nodes with the path "A/B/C" by calling getChildren(N, ["A", "B", "C"]) This returns a list of nodes, each of which has a path of "A/B/C" rooted in the node N. In Python the code for my getChildren() function looks like this: def getChildren(node, path): head = path[0] tail = path[1:] children = [ child for child in node.childNodes if child.nodeType == child.ELEMENT_NODE and child.tagName == head ] if len(tail) == 0: return children else: return reduce(lambda x, y : x + y, [ getChildren(child, tail) for child in children ], []) (Here, "x + y" is list concatenation.) Sometimes I was only interested in the first child node found, instead of a list of all of them. So I made def getChild(node, path): return getChildren(node, path)[0] I also needed a test to see if any child nodes with a given path existed: def childExists(node, path): return len(getChildren(node, path)) > 0 And finally, I needed the function to give me the text at a particular node as a string: def getText(node, path): if not childExists(node, path): return "" else: text = [ n.data for n in getChild(node, path).childNodes if n.nodeType == n.TEXT_NODE ] return reduce(lambda x, y : x + y, text, "") (Here, "x + y" is string concatenation.) With this I managed to write a quick and dirty XML traversal that produced output files with the text of the nodes that were of particular interest. I did find myself with a small additional "problem", in that there were certain values that only made sense if they appeared together (for example interface ID, interface IP address, and interface MAC address). I only wanted to extract interfaces if all three values were present in the Cisco XML file. Thus I extended the childExists() and getText() functions to work on lists of paths: def childrenExist(node, path_list): if len(path_list) == 0: return False else: exist = [ childExists(node, path) for path in path_list ] return reduce(lambda x, y : x and y, exist) def getMultiText(node, path_list): if len(path_list) == 0: return "" else: text = [ getText(node, path) for path in path_list ] return reduce(lambda x, y : x + " " + y, text) This was all that I needed to write a very simple little program that loops over the device nodes that are of interest, dumping all the information into text files. Here is a cut down version of how I use the above functions to generate my output. (I have slightly pseudocoded this to remove all of the file handling.) def main(): try: dom = parse(src_xml_filename) except: print "Error: File '%s' does not contain XML data." % src_xml_filename exit() devices = dom.getElementsByTagName("RMEPlatform") if len(devices) == 0: print "Error: File '%s' does not contain Cisco Works data." % src_xml_filename exit() for device in devices: outputDevice(device) # Here follow all the "XPath"s we need. # These are rooted in //RMEPlatform InstanceName = ["Cisco_NetworkElement", "InstanceName"] OfficialHostName = ["Cisco_NetworkElement", "OfficialHostName"] OSElement = ["Cisco_NetworkElement", "Cisco_LogicalModule", "Cisco_OSElement"] IfEntry = ["Cisco_NetworkElement", "Cisco_IfEntry"] # These are rooted in //RMEPlatform/Cisco_NetworkElement/Cisco_IfEntry IfEntryID = ["InstanceID"] IfEntryIP = ["NetworkAddress"] IfEntryMAC = ["PhysicalAddress"] def outputDevice(device): if childExists(device, InstanceName) or childExists(device, OfficialHostName): device_name = getText(device, InstanceName) + "_" + getText(device, OfficialHostName) f = As you can see, with a little bit of list comprehensions and some folds you can end up with a neat small XML data retrieval system. Obviously, you'll need to know exactly what is in your XML file, but that has so far always been the case anywhere I have had to deal with XML. After having written this without any thought on performance, I was going to come back and optimise some of the code. Well, that was roughly four months ago and all the feedback I received from the users was that it is very fast (too fast actually, since several people actually ran the program twice, thinking there must have been a failure the first time round). Even though the 10MB XML file is parsed completely and loaded into memory as a rather large DOM tree, modern machines make optimisation of my simple approach unnecessary. I won't bother making this any faster. I don't think I could have gotten much pleasure out of attempting to use Python's DOM library without these abstractions. I also think that I wouldn't have managed to produce a customer solution that is very easily adaptable, even by consultants that are not profficient in Python.
Thank you for your time, and I hope you like the above, |