We’ve always had data and information, but we’ve not always known what to do with it. Now that they are increasingly available in digital form, the scope for harnessing and developing them has never been greater. Few lawyers are geeks, however, and most of them are rather conservative when it comes to thinking up creative ways of exploiting what might be a digital gold mine. 

This short, clear book is a very good introduction for lawyers and information professionals. Sarah Sutherland is President and CEO at the Canadian Legal Information Institute (CanLII) which as anyone who regularly uses online legal information will know is a fairly cutting edge platform in terms of its creative ways of flagging up and linking case law data. In fact it’s streets ahead of the other open access case law sites, thanks to a more progressive funding model, with features many wholly commercial platforms might envy. 

In her introduction Sutherland explains that:

“Legal research has changed a great deal in the last thirty years as it has become digitised, and the majority of it is now done online. Part of the reason this took so long is that there was so much historical data that needed to be digitised in order to make the transition viable.”

But over time the quality of digitised data has improved, as have the methods of using it – notably search. “Search systems are getting better at delivering what researchers mean rather than just the results that follow from the direct terms entered.” In other words, search systems are getting more intelligent, and that intelligence is increasingly artificial. 

Legal data takes many forms, and comes from many sources, but broadly speaking it comprises: 

  • Court judgments and dockets (ie litigation documents filed with the court)
  • Legislation, codes and regulations
  • Documents generated by lawyers in their practices
  • Academic research data
  • Business data from legal technology companies and law firms

Sutherland looks at each in turn when considering the needs of users and the possibilities for development of tools and processes to help them. In a series of chapters she first of all considers the sources of legal data, and the forms it can take, and then the techniques used for its analysis and interpretation, issues with using legal data and how artificial intelligence is transforming the landscape of legal data analysis, before sketching out a ‘vision for the future’. 

In relation to sources of data she considers the particular qualities and problems associated with each type. For example, something we’re all familiar with is the massive amount of data we have in the form of court judgments and decisions. We use them all the time. We want to compare one case with another and find connections and patterns. But the data is variable and woefully unstructured. The form it takes varies from court to court and from judge to judge. There have been calls for it to be more structured, but getting judges to use a standardised template is quite a challenge. Many decisions are not even written when delivered, and may or may not be transcribed later. If cases settle before trial, no decision will be published, thus excluding a sizeable proportion of cases from any analysis of litigation outcomes. Even when judgments are written they are not consistently or comprehensively published, despite the best efforts of us publishers to get hold of them. 

A similar variability is found in the data kept by law firms, whose document management systems may be anything from dusty old deed boxes to virtual folders on the cloud. Unlike other forms of data, legal data often consists of a small number of very long documents, rather than a large number of small standardised units, making it that much harder to analyse collectively. Different techniques can be used to analyse data and to a large extent the nature of the data dictates how it can be analysed. But increasingly those methods involve machine learning and natural language processing. Sutherland explains these techniques of artificial intelligence for the uninitiated, and considers the pros and cons of the tools created from them. She identifies and addresses various other issues, such as gaps in the data, its ambiguity, its inconsistency and so on. 

Making predictions about the future of legal research is fraught with uncertainty, as Sutherland recognises. Some developments will boost productivity but others may rob firms or providers of the business models that currently exploit gaps in provision. As one door opens, another may close. On the other hand, whatever the hype, the assumption that artificial intelligence will solve all our current data analysis problems is a chimera, and some traditional methods will continue to hold sway. The main effect of progress is likely to be the increase in the proportion of ‘born digital’ data that matters to research, and the increase in the extent to which it is helpfully structured to assist analysis, rather than forced through a funnel of digital restructuring to enable such a process to even begin. Also likely to increase is our awareness of concerns over the misuse of data, over transparency of algorithms and AI models which may be trained on corrupted or biased data, and risks associated with the use of personal data for business purposes. 

All things considered, this is an area where both the pace and scope of development and the increasing awareness of social and ethical concerns justify paying attention to the subject, and for that this book is an excellent primer. 


Legal Data and Information in Practice: How Data and the Law Interact, by Sarah A. Sutherland (Routledge, pb £29.99.)


Featured image: Photo by ThisIsEngineering, via Pexels.