As the Ebola recovery efforts are ramped up in Sierra Leone, Guinea, and Liberia, cash transfers to citizens is going to be a major component of programs that need to either remunerate health workers or provide social security for affected groups. It is vital for the different cash transfer programs to collaborate amongst themselves to clamp down on the number of duplicate records of people that results in a substantial number of people receiving multiple payments. However, this process is currently carried out via manually cross-checking the excel-based payment records, which is a time-consuming and error prone process.
To solve this issue,we are working on the creation of an open cash transfer data platform that development sector organisations in Sierra Leone, Guinea, and Liberia can use to automatically identify duplicate payment records within both their own datasets as well as across datasets of other organisations and programs.
Our solution shall leverage a number of open source technologies.
- The web portal shall be based on CKAN, a powerful open source data management system that would provide tools for streamlining the publishing,sharing,finding and using of datasets of the organizations.
- This web portal would be integrated with Dedupe, a Python based machine learning system for performing similarity analysis and entity resolution on the structured datasets.
- The web portal would also be integrated with ElasticSearch, an enterprise-scaled distributed, multi tenant capable for providing full-text search capabilities and advanced search features.
Using the functionalities of both Dedupe and ElasticSearch, our web based portal would enable the development sector organizations to upload their beneficiary datasets and to automatically detect duplicates and double dippers both within their own datasets as well as the shared datasets of other development organizations.