GSoC 2018 - Week - 4
Week 3 of the GSoC coding period is completed successfully. GSoC (Google Summer of Code) is a global program focused on bringing more student developers into open source software development. Students work with an open source organization on a 3-month programming project during their break from school.
I am working on “Developing a “ Product Advertising API ” module for Drupal 8” - #7. The “Product Advertising API” which is renamed to “Affiliates Connect” module provides an interface to easily integrate with affiliate APIs or product advertising APIs provided by different e-commerce platforms like Flipkart, Amazon, eBay etc to fetch data easily from their database and import them into Drupal to monetize your website by advertising their products. If the e-commerce platforms don’t provide affiliate APIs, we will scrape the data from such platforms.
Some of the tasks accomplished in this week are -
Configuration Form for saving the configuration of the affiliates_connect settings is completed. Link to the issue - #2976037
As every vendor has a different configuration so configuration form for the plugins is still under development and discussion. Link to the issue - #2977044
Functional Tests for verification of routes defined in the project as suggested by borisson_ is also completed and under review. It also included the functional tests for checking whether product’s data is submitted correctly by affiliates_product add form or not. Tests for deleting & editing the products are also completed under this issue. Link to the issue - #2977377
Tests can increase the velocity without doing too much extra work.
Week 4 - Goals
The basic module development is completed so I will start working on Scraper API and I have done some work this week by doing/implementing some basic level of scraping using various libraries/modules in Node. Link to the repo - scraping-using-node. I have done some studies on Flipkart Affiliate APIs in this repo so I am also working on Flipkart Plugin. I will update my further work in the repo.
Flow for scraping any e-commerce website content is -
2. After the categories of the products, We can scrape the product category wise and paginate each category till the last page to scrape every product so we need some way to paginate the whole category.
3. We need to scrape the detailed product data so we need to go to each product link and scrape its content.
4. Saving all product’s link to a file for further scraping/updating existing product.
- We need to scrape thousand of products data so it can take a lot of time depending on the implementation so I need to devise the algorithm that takes minimal time in scraping the whole lot of data.