Grapnel

Microservices-based dark web crawling software using Python, FastAPI, and GraphQL, implementing the saga pattern for law enforcement agencies

Grapnel

As a core team member, I contributed to the development of a sophisticated software solution designed to crawl the dark web and identify potential sites of interest. This project is an innovative microservices application powered by Kubernetes, ensuring scalability, resilience, and efficient resource management. By employing RabbitMQ for asynchronous messaging, we achieved effective communication between microservices, facilitating robust data processing and task management.

The backend is built using Python and FastAPI, which provides a high-performance, modern framework for building APIs. I specifically focused on implementing GraphQL APIs using Python Strawberry, enhancing the flexibility of data queries and allowing clients to request only the data they need. This approach not only improved data retrieval efficiency but also simplified the overall API structure.

To enrich the functionality of the crawler, I integrated Headless Chromium for taking screenshots of identified dark web sites, providing visual confirmation of potential threats. Additionally, I worked on developing a queue management service that significantly boosted our crawl throughput from a mere 1 URL per minute to an impressive 60 URLs per minute. This improvement was achieved by implementing the saga pattern, ensuring reliable transaction management across distributed services.

The application also leverages MinIO for data migration, enabling efficient object storage and retrieval. My contributions extended to real-time notifications, ensuring that alerts were promptly communicated to users regarding important findings. Furthermore, I played a pivotal role in migrating our inter-microservice communication to gRPC, optimizing the efficiency and performance of our service interactions.

We utilized MongoDB as our primary database, which allowed us to handle large volumes of unstructured data effectively. For data analysis and retrieval, we implemented Elasticsearch, providing powerful full-text search capabilities and analytics. The processed data is then stored in a graph database, facilitating complex querying and relationship mapping.

Our solution is currently deployed and utilized by several police forces across India, providing them with vital insights into dark web activities and aiding in proactive law enforcement efforts. The combination of cutting-edge technologies and innovative design patterns positions this project as a critical tool in the fight against cybercrime.

Overall, my experience in this project has not only sharpened my technical skills in microservices architecture, data management, and real-time processing but also reinforced my ability to collaborate effectively within a dynamic team to deliver impactful software solutions.

Life’s too short for boring code—let’s build something brilliant and have fun while we’re at it!