Everything You Always Wanted to Know About Writing Good Rake Tasks * But Were Afraid to Ask
Rake tasks are a very important component of our Rails Apps, because we usually use it to do maintenance or data migration jobs over a collection of data.
One of the guys at the office asked me. What things should I keep in mind when writing a rake task and how do I know if my rake task is well written.
The answer is not simple, because most of the time depends on the task you need to accomplish. I have a few rules I use to make my rake task what I consider good rake task.
What Makes a Rake Task a Good Task? #
I think a rake task is good if:
- It has a meaningful and simple description.
- It uses namespace to group similar or related tasks.
- Its file structure follows the namespaces structure.
- It’s isolated on a class so we can re use it and test it with ease.
- It displays details about it progress without being too verbose.
- Its has it own log file containing start datetime, end datetime, how much did the task last and all the errors.
Writing Meaningful Descriptions #
Bad #
# lib/tasks/import_topics.rake
task import_topics: :environment do
...
end
Writing a description it’s useful because it give us some details without reading the code. Also it’s useful when you want to inspect the list of available rake tasks using rake -T
. Now we only know that this task imports topics, nothing else.
Good #
Adding a good description for the previous task like “Migrate topics from legacy database to new database” give us more details about what the previous task do or should do. Now we know that these topics are imported from the legacy database to our current app database.
# lib/tasks/migrate_topics.rake
desc 'Migrate topics from legacy database to new database'
task migrate_topics: :environment do
...
end
Note: If you can’t explain your rake task in one sentence, that would probably mean that your rake task is doing more than one job and you should consider splitting your rake task.
Group Your Tasks Using Namespaces #
Bad #
# lib/tasks/migrate_topics.rake
desc 'Migrate topics from legacy database to new database'
task migrate_topics: :environment do
...
end
# lib/tasks/migrate_users.rake
desc 'Migrate users from legacy database to new database'
task migrate_users: :environment do
...
end
# lib/tasks/migrate_questions.rake
desc 'Migrate questions from legacy database to new database'
task migrate_questions: :environment do
...
end
All these task has one thing in common. They are used to migrate information. So you should group this similar tasks under a common namespace, migrate
.
Good #
# lib/tasks/migrate/topics.rake
namespace :migrate do
desc 'Migrate topics from legacy database to new database'
task topics: :environment do
...
end
end
# lib/tasks/migrate/users.rake
namespace :migrate do
desc 'Migrate users from legacy database to new database'
task users: :environment do
...
end
end
# lib/tasks/migrate/questions.rake
namespace :migrate do
desc 'Migrate questions from legacy database to new database'
task questions: :environment do
...
end
end
Now we know all these rake tasks are related. Using namespaces will help you keeping your code organized, clean and understandable.
Rake File Structure #
Bad #
# lib/tasks/migrate_topics.rake
desc 'Migrate topics from legacy database to new database'
task migrate_topics: :environment do
...
end
# lib/tasks/migrate_users.rake
desc 'Migrate users from legacy database to new database'
task migrate_users: :environment do
...
end
# lib/tasks/migrate_questions.rake
desc 'Migrate questions from legacy database to new database'
task migrate_questions: :environment do
...
end
File Structure:
lib
└── tasks
├── recalculate_badges_for_users.rake
├── migrate_users.rake
├── migrate_topics.rake
├── migrate_questions.rake
├── migrate_answers.rake
├── recalculate_best_answer.rake
├── topic_accessible_by_url.rake
├── invalid_questions.rake
├── remove_duplicated_topics.rake
├── calculate_last_activity_for_question.rake
├── ...
├── clean_votes.rake
└── cache_visits.rake
Although the file name has an intention revealing name, having 30 or more rake tasks on the same folder makes hard to indetify quickly which rake belongs to which resource.
Good #
Every rake task perform operations over a resource or several resources (but it always works around a main resource). Identifying this main resource helps us building a good file structure and group our rake task under namespaces and folders.
# lib/tasks/migrate/topics.rake
namespace :migrate do
desc 'Migrate topics from legacy database to new database'
task topics: :environment do
...
end
end
# lib/tasks/migrate/users.rake
namespace :migrate do
desc 'Migrate users from legacy database to new database'
task users: :environment do
...
end
end
# lib/tasks/migrate/questions.rake
namespace :migrate do
desc 'Migrate questions from legacy database to new database'
task questions: :environment do
...
end
end
The main resource here it’s the migration, so we used migrate
namespace and create a folder under tasks folder named as the namespace.
And so on with the remaining tasks.
File Structure:
This is how our rake file structure will look like when applying this rule over all the rake tasks.
lib
└── tasks
├── migrate
│ ├── users.rake
│ ├── topics.rake
│ ├── questions.rake
│ └── answers.rake
├── users
│ ├── recalculate_badges.rake
│ └── cache_visits.rake
├── ...
├── questions
│ ├── recalculate_best_answer.rake
│ ├── topic_accessible_by_url.rake
│ ├── clean_votes.rake
│ ├── log_invalid.rb
│ └── calculate_last_activity.rake
└── topics
└── remove_duplicated.rb
Now our files are more organized and it’s easier to find an specific task.
Isolating Your Task Using a Class #
I’ll start with an example because it make easy to understand the concept.
We have an App kind of StackOverflow, our users can make questions, answer them, leave comments and so, and we already have a lot of information in our database. Sunddenly we decided to implement a Badge System in our app.
Now that we developed the Badge System, we only need to recalculate the badges for every user. Now it’s the time to use a rake task.
Bad #
# lib/tasks/users/recalculate_badges.rake
namespace :users do
desc 'Recalculates Badges for All Users'
task recalculate_badges: :environment do
User.find_each do |user|
# Grants teacher badge
if user.answers.with_votes_count_greater_than(5).count >= 1
user.grant_badge('teacher')
end
...
# Grants favorite question badge
user.questions.find_each do |question|
if question.followers_count >= 25
user.grant_badge('favorite question') && break
end
end
# Grants stellar question badge
user.questions.find_each do |question|
if question.followers_count >= 100
user.grant_badge('stellar question') && break
end
end
end
end
end
This task may seem simple to understand but it has a lot of problems:
- It’s hard to test.
- We have a lot of logic that is not isolated.
- We have duplication.
- This task is very large and almost imposible to read. Imagine that you have 25 badges and one condition per badge. This task would have more than 150 lines.
Good #
Now that we pointed all things that’s wrong with this task, let’s fix it. We’ll extract all the logic and move it to a Service Object.
# lib/tasks/users/recalculate_badges.rake
namespace :users do
desc 'Recalculates Badges for All Users'
task recalculate_badges: :environment do
User.find_each do |user|
RecalculateBadges.new(user).all
end
end
end
# app/services/recalculate_badges.rb
class RecalculateBadges
attr_reader :user, :questions, :answers
def initialize(user)
@user = user
@questions = user.questions
@answers = user.answers
end
def all
teacher
favorite_question
stellar_question
end
def teacher
...
grant_badge('teacher')
end
def favorite_question
question_followers_count_badge(25, 'favorite question')
end
def stellar_question
question_followers_count_badge(100, 'stellar question')
end
private
def grant_badge(badge_name)
return unless badge_name
user.grant_badge(badge_name)
end
def question_followers_count_badge(followers_count, badge_name)
...
grant(badge_name)
end
end
Now we extracted all the logic to an specific class you will notice the following benefits:
- Our rake’s logic is easier to read and understand, now every method on our RecalculateBadges class represents a badge and we have the
all
method which triggers all badge methods. - We can test every badge logic on isolation and it will be very easy to test.
- We removed duplication.
There are some important concepts I would like to highlight:
- For Service Objects always prefer instance methods over class methods, they are much easier to refactor.
- Notice that our Service Object performs operation over a single user and not the entire collection, this give us more flexibility if we want to re use this class anywhere else on our application.
Display Details About Task Progress Without Being Too Verbose #
Bad #
One of the things that I find very annoying is displaying irrelevant information when running a task. This makes harder to monitor the progress and only pollutes your terminal.
# lib/tasks/users/recalculate_badges.rake
namespace :users do
desc 'Recalculates Badges for All Users'
task recalculate_badges: :environment do
User.find_each do |user|
puts "#{user.first_name} #{user.last_name} - #{user.email}"
RecalculateBadges.new(user).all
end
end
end
Terminal Output:
Mario Krols - mkrols@gmail.com
Kristen Delt - kdelt@gmail.com
Monica Lewinsky - mlewinsky@clinton.com
...
Fake User - fuser@outlook.com
As you can see this task will display user’s first name, last name and email, and it will not notify you about any errors or how many users have processed so far. Also on this particular case, we don’t need to display user’s first name or last name, they are polluting our terminal.
Good #
Now lets say that RecalculateBadges#all
method returns true if badges recalculated successfully and returns false and saves the errors on an instance variable @errors
if it fails. Also we have a new instance method errors
which returns the value of @errors
as a string.
# lib/tasks/users/recalculate_badges.rake
namespace :users do
desc 'Recalculates Badges for All Users'
task recalculate_badges: :environment do
users_count = User.count
User.find_each.with_index do |user, index|
recaulculate_badges = RecalculateBadges.new(user)
if recalculate_badges.all
puts "#{index}/#{users_count} - #{user.email}".green
else
puts "#{index}/#{users_count} - #{user.email} - #{recalculate_badges.errors}".red
end
end
end
end
# app/services/recalculate_badges.rb
class RecalculateBadges
attr_reader :user, :questions, :answers, :errors
def initialize(user)
@user = user
@questions = user.questions
@answers = user.answers
end
def all
if user.can_receive_badges?
teacher
favorite_question
stellar_question
true
else
@errors = user.badges_validation_messages
false
end
end
def teacher
...
grant_badge('teacher')
end
def favorite_question
question_followers_count_badge(25, 'favorite question')
end
def stellar_question
question_followers_count_badge(100, 'stellar question')
end
private
def grant_badge(badge_name)
return unless badge_name
user.grant_badge(badge_name)
end
def question_followers_count_badge(followers_count, badge_name)
...
grant_badge(badge_name)
end
end
Terminal Output:
1/100 - mkrols@gmail.com
2/100 - kdelt@gmail.com
3/100 - mlewinsky@clinton.com - This user can't receive any badge because it's blocked
...
100/100 - fuser@outlook.com
This task display all necessary details without being too verbose. Now you can see:
- The user being processed and how much left to be processed.
- The email of the user that was processed so you can identify them by a unique attribute other than the id.
- A detailed error message in case of failure.
Remember to display only the information you need, anything else will end up being too verbose and will only make harder to read and understand the information you’re displaying.
Note: If you are not interested on displaying any kind of information, a useful output will be printing a green dot for every element processed successfully and a red X for every element that fails, this way you’ll know that the task is running.
.................X...........X.......XX........X.
Always Use a Log File #
Bad #
Imagine now that our application has 100.000 users and we need to recalculate badges for everyone.
# lib/tasks/users/recalculate_badges.rake
namespace :users do
desc 'Recalculates Badges for All Users'
task recalculate_badges: :environment do
users_count = User.count
User.find_each.with_index do |user, index|
recaulculate_badges = RecalculateBadges.new(user)
if recalculate_badges.all
puts "#{index}/#{users_count} - #{user.email}".green
else
puts "#{index}/#{users_count} - #{user.email} - #{recalculate_badges.errors}".red
end
end
end
end
The main problem you have here it’s that you may need infinite scroll on your terminal to keep track, also processing 100.000 will take some time and you probably won’t pay attention all the time. Also how about if you are running this task remotely or in background, you won’t be able to see any information.
Good #
I think having a log file is a MUST when writing rake tasks. This helps you to keep track for every task triggered, consult it every time you want to and share it with anyone easily.
# lib/tasks/users/recalculate_badges.rake
namespace :users do
desc 'Recalculates Badges for All Users'
task recalculate_badges: :environment do
log = ActiveSupport::Logger.new('log/users_recalculate_badges.log')
start_time = Time.now
users_count = User.count
log.info "Task started at #{start_time}"
User.find_each.with_index do |user, index|
recaulculate_badges = RecalculateBadges.new(user)
if recalculate_badges.all
log.info "#{index}/#{users_count} - #{user.email}"
else
log.info "#{index}/#{users_count} - #{user.email} - #{recalculate_badges.errors}"
end
end
end_time = Time.now
duration = (start_time - end_time) / 1.minute
log.info "Task finished at #{end_time} and last #{duration} minutes."
log.close
end
end
Tracking start time, end time and duration is very important. If you have a rake task that you run every one hour and your rake task takes one and a half hour to be completed, your task will overlap with the same task and you’ll end up running out of memory on your server for example.
Note: Besides using a log file, you should also print some output, that way you’ll know that your rake it’s working. I didn’t add any output on this example because I wanted to keep this example as simple as possible.