Boston Youth Arts Evaluation Project
Creating an Evaluation Plan & Designing Tools
Before we launched into developing our own tools, we researched many others, hoping that the appropriate tools had already been developed. Although we found many helpful, none spoke specifically to the three desired outcome areas for our current participants (I Create, I Am, and We Connect) and to the six intermediate and long-term outcome areas we identified for our alumni (Able to Navigate, Able to Engage and be Productive, Able to Make Connections and to do so with Resiliency, Self-Efficacy/Personal Fulfillment, and Community Engagement). It was clear we needed to create our own tools, but, knowing that this was going to be a daunting task, we first needed a plan.
Creating an Evaluation Plan
The following ten questions inspired by the W.K. Kellogg Foundation’s Evaluation Handbook helped us to design both a plan and the tools that we needed (1998, pp. 47-99). We believe these questions are very helpful for all organizations that are attempting to design a system of evaluation.
- WHO IS ON OUR TEAM? Identify stakeholders
and your evaluation team, including staff, early in the process.
Getting input from all of your staff members on the design of
the evaluation tools is very important. Collaborators regularly
asked for feedback from their team, and we held all-staff training
for all five sites to help design and pilot our tools.
- WHAT DO WE VALUE? Define the “sacred bundle”
(the creative soul of the work that you do). Develop a strong
logic model and clear theory of change. Do this with your team
(not in isolation) in order to get buy-in from a diverse and
rich knowledge base. We worked with five different disciplines
and populations, and while this was very challenging at times,
we were closely aligned in our values.
- WHAT DO WE ASK? Define the indicators/outcomes
in your logic model and then develop evaluation questions that
align with your logic model. Indicators should be Specific,
Measureable, Action-oriented, Realistic, and Timed (SMART).
Make sure, too, that the questions connect with your “Sacred
Bundle.” The toughest part of our task was formulating measurable
questions that were aligned with our indicators. Writing them
in a language accessible to both youth and funders proved quite
- WHAT WILL IT COST? Budget an amount between
5-10% of your project’s total budget for evaluation. Know that
evaluation is time-intensive and that there is significant effort
and time needed for the next six steps. Although RAW received
funding to help manage and lead this project, none of the collaborating
organizations received funding to offset the additional resource
demands of BYAEP. The staff time devoted to this project exceeded
our budget, and we found that we often underestimated how much
time it takes to formulate, implement, and analyze evaluations.
Creating the BYAEP Handbook is partly an attempt to minimize
the time investment for others. That being said, the process
was deeply rewarding, and wrestling with the questions, our
values, and the analysis enhanced our ability to understand
and convey our missions.
- WHO OWNS THIS? Find out who will take on the
evaluations. Will this be handled with staff on hand and/or
external evaluators or consultants? This time-intensive process
requires ownership and a clear assessment of staff and outside
skills and resources (especially time) needed. We received a
lot of advice and help on this project. We also needed to contact
experts in the field to help with the pilot design. Suzanne
Bouffard from Harvard, Steve Seidel from Project Zero, Michael
Sikes from Arts Educations Partnership, Dennie Palmer Wolf,
and Julia Gittleman all helped in the formulation of our pilot
evaluations along with BYAEP collaborators, who contributed
countless hours. Individual staff members engaged in all components
of the evaluation process, with RAW’s Käthe Swaback managing
the flow, guidance, and details of reporting.
- WHAT CAN WE GATHER? Plan how you will collect
the data as you assess the resources and skills available. Determine
what data you need to collect and be careful not to collect
data that is “interesting” but can easily lead to “data burn-out.”
We found that we were collecting far too much data the first
year in our pilot. Although all this information was informative,
we simply did not have the staff resources to work with all
the results. We cut the Self-Evaluation from six pages in the
first year to four pages, completed online, in year two. We
decided to include optional worksheets for program staff to
complete with youth in order to gain other information that
would be valuable for the leaders but not necessarily for the
organization as a whole (see the Workbook for examples).
- HOW WILL WE GATHER IT? Collect both qualitative
(descriptive information) and quantitative (information that
can be counted) data. Determine what information you need and
how you will obtain it in order to best assess your outcomes.
Did we want to use pre- and post-tests, focus groups, interviews,
observations, or other creative tools we could invent? We found
collecting stories, numbers, and images (photos and other visuals)
was important in capturing the vibrant makeup of our programs.
When we could, we offered multiple-choice answers in order to
derive percentages that we could rate and compare. Although
we saw many downfalls with pre- and post-tests, we used them
in order to assess change, resulting in some important findings.
It was also important to assess things creatively. We piloted
the Drawing Evaluations; their results can be viewed in the
- WHAT DOES IT ALL MEAN? Analyze and understand
your findings. Determine what you can assess yourselves and
where you may need technical assistance and statistical analysis.
We were challenged by some of the technical aspects of Excel
and the fact that none of us were well-versed in statistics
and analysis. Learning to manipulate Survey Monkey proved important,
allowing us to download reports in a useable format. In our
third year we formulated an Excel template to populate all the
results from Survey Monkey into a system that presented comparisons
and enabled us to delete duplicated and unmatched evaluations.
- WHAT AND WHO CAN WE TELL? Communicate findings
to participants, staff, and stakeholders. Report on what you
wanted to do, what you did, how you did it, what you learned,
and what you might want to change going forward. This was an
important part of the process. Many evaluation efforts end in
the data-gathering stage, and we were determined to see it through
to the reporting stage. With BYAEP we had the unusual opportunity
to share our data–both the strengths and weaknesses of our findings–with
each other. This afforded us a new lens for viewing ourselves,
our organizations, and our field as a whole.
- HOW CAN WE IMPROVE? Make practical use of the
results by reflecting it back to your programs. Use what you
have learned to inform program improvements and to better assess
and meet the needs of youth, staff, and community. Although
it was rewarding to see our high scores in several areas, discovering
where our low scores fell and discussing how we might work to
improve these outcome areas was most beneficial. This was instrumental
in setting goals for the year and designing a curriculum and
initiatives that would better address these areas.
Researching Designs and Tools
Our greatest challenge was to try to create a reliable, valid, and practical evaluation plan and tools that would address the indicators of our outcomes and provide us with usable data to improve our programs. There is great diversity in the type of evaluation models developed and used by the social sciences. The following approaches are some that were recommended for us to consider.
Experimental Designs: These evaluations are
considered the “gold standard” in research because they consider
not only outcomes of programs and their participants, but also
the comparison of those who are not involved with the program
and assigned at random to a control group. The outcomes of the
control group are then compared to the program outcomes to understand
the direct effect of the program.
Quasi-Experimental Design: This design is exactly
the same as experimental design except that there is no random
assignment of participants to a control group; instead, the assignment
may be based on things like convenience.
Non-Experimental Impact Evaluations: These
types of evaluations look at changes in the indicators of outcomes
among program participants or groups but do not include comparison
groups who are not part of the program(s).
Pre- and Post-Participation Surveys: These
surveys relate to before and after comparisons and look at outcomes
for participants before the program’s start and at its conclusion.
Retrospective Evaluation: This kind of evaluation
asks youth to compare how they are “now” to how they were before
they started the program. Retrospective evaluations are seen as
less reliable and valid than pre-post assessments because one’s
“recall of information through reflection may be subject to problems
of insufficient recall as well as offer the potential for fabricated
or biased responses” (Lamb, 2005, p. 18). However, other studies
have shown little difference in traditional pre-tests/post-tests
and the retrospective evaluation.
Utilization-Focused Evaluation: The utilization-focused
approach is one in which evaluations are designed, used, and judged
by their utility so that the whole process is designed for and
by the intended subjects for a specific use. These evaluations
are personal, situational, and implemented in a way that makes
a significant difference to improving programs and improving decisions
Participatory Evaluation: Participatory evaluation
design is the process of designing evaluations with the people
involved in the organization, programs, and/or community (including
funders) in order to make the findings more relevant and meaningful
to all stakeholders.