Give this web app a URL and it will attempt to fetch the site contents for you asynchrously. Use the job ID to retrieve those contents at a later time.
-
POST /api/job- kick off a job to fetch web content$ curl -X POST "http://localhost:4000/api/job" \ -H "Content-Type: application/json" \ -d '{"url":"https://www.youtube.com/watch?v=dQw4w9WgXcQ"}' { "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "title": null, "status": "processing", "id": "2d80bd8dc50140089ae1ce6766f38c57", "content": null }
-
GET /api/job- return the status of a previously run job$ curl "http://localhost:4000/api/job/2d80bd8dc50140089ae1ce6766f38c57" { "url": "https://www.youtube.com/watch?v=dQw4w9WgXcQ", "title": "Rick Astley - Never Gonna Give You Up - YouTube", "status": "success", "id": "2d80bd8dc50140089ae1ce6766f38c57", "content": "<!DOCTYPE html><html..." }
Thanks to the magic of Phoenix, it also provides an admin view of the jobs that have been run.
A detailed display provides job status and a thumbnail:
You'll need elixir, along with:
Once configured, update config/dev.exs with your database creds:
config :fetch_me_if_you_can, FetchMeIfYouCan.Repo,
adapter: Ecto.Adapters.Postgres,
username: "postgres",
password: "postgres",
Then:
- Install mix dependencies with
mix deps.get - Create and migrate your database with
mix ecto.create && mix ecto.migrate - Start Phoenix endpoint with
mix phoenix.server
Now you can visit localhost:4000 from your browser. To view the jobs admin, visit localhost:4000/jobs
- Building this just involved connecting a few frameworks and libraries. The bulk of the development work was in:
- the worker
- the controller
- and to a lesser extent, the view and the model.
- The service treats every request uniquely. Probably want some sort of cleanup service that prunes old data periodically. Better yet, why store the data in a database at all? Just store in redis with a reasonable TTL.
- Testing and security were not considerations, so don't use this in production.
- This was pretty fun to put together!