5. Future research
directions Classical natural language processing models are trained on well written text. Accuracy of tagging drops significantly with unstructured text, abundant abbreviations, misspellings and ungrammatical constructs. While our study showed the feasibility of our methods, more work is needed to establish a drug safety surveillance mechanism. Our plan for the future is to work on improved text mining algorithms for entity and relationship extraction.1. Improve inter-annotator agreement by refining the guidelines on annotation. 2. Develop an evaluation corpus to benchmark the performance of the text mining algorithms. 3. Expand the reference gazetteer by extracting AE terms from approved drug labels and other medical abbreviation lists. 4. Investigate deep machine learning strategies, such as those recently described in the review articles [23–25], to iteratively improve recognition of patterns that are likely to be ADEs. 5. Evaluate algorithm performance on discharge summaries from other hospitals that have different EMR technology platforms and clinician practices for writing discharge summaries to assess the applicability of the algorithms in the broader health care system. 6. Explore text-mining strategies, such as frequent itemset mining, that would be suited for discovery of previously unknown drug-AE relationships.