View on RubyGems View on RubyToolbox
SmarterCSV provides a convenient interface for reading and writing CSV files and data.
Unlike traditional CSV parsing methods, SmarterCSV focuses on representing the data for each row as a Ruby hash, which lends itself perfectly for direct use with ActiveRecord, Sidekiq, and JSON stores such as S3. For large files it supports processing CSV data in chunks of array-of-hashes, which allows parallel or batch processing of the data.
Its powerful interface is designed to simplify and optimize the process of handling CSV data, and allows for highly customizable and efficient data processing by enabling the user to easily map CSV headers to Hash keys, skip unwanted rows, and transform data on-the-fly.
This results in a more readable, maintainable, and performant codebase. Whether you're dealing with large datasets or complex data transformations, SmarterCSV streamlines CSV operations, making it an invaluable tool for developers seeking to enhance their data processing workflows.
When writing CSV data to file, it similarly takes arrays of hashes, and converts them to a CSV file.
One user wrote:
Best gem for CSV for us yet. [...] taking an import process from 7+ hours to about 3 minutes. [...] Smarter CSV was a big part and helped clean up our code ALOT
SmarterCSV is designed for real-world CSV processing, returning fully usable hashes with symbol keys and type conversions — not raw arrays that require additional post-processing.
Beware of benchmarks that only measure raw CSV parsing. Such comparisons measure tokenization alone, while real-world usage requires hash construction, key normalization, type conversion, and edge-case handling. Omitting this work understates the actual cost of CSV ingestion.
For a fair comparison, CSV.table is the closest Ruby CSV equivalent to SmarterCSV.
| Comparison | Speedup (P90) |
|---|---|
| vs SmarterCSV 1.14.4 | ~5× faster |
| vs CSV.table | ~7× faster |
| vs CSV hashes | ~3× faster |
Benchmarks: Ruby 3.4.7, M1 Apple Silicon. Memory: 39% less allocated, 43% fewer objects. See CHANGELOG for details.
SmarterCSV is designed for robustness — real-world CSV data often has inconsistent formatting, extra whitespace, and varied column separators. Its intelligent defaults automatically clean and normalize data, returning high-quality hashes ready for direct use with ActiveRecord, Sidekiq, or any data pipeline — no post-processing required. See Parsing CSV Files in Ruby with SmarterCSV for more background.
$ cat spec/fixtures/sample.csv
First Name , Last Name , Emoji , Posts
José ,Corüazón, ❤️, 12
Jürgen, Müller ,😐,3
Michael, May ,😞, 7
$ irb
>> require 'smarter_csv'
=> true
>> data = SmarterCSV.process('spec/fixtures/sample.csv')
=> [{:first_name=>"José", :last_name=>"Corüazón", :emoji=>"❤️", :posts=>12},
{:first_name=>"Jürgen", :last_name=>"Müller", :emoji=>"😐", :posts=>3},
{:first_name=>"Michael", :last_name=>"May", :emoji=>"😞", :posts=>7}]Notice how SmarterCSV automatically (all defaults):
- Normalizes headers →
downcase_header: true,strings_as_keys: false - Strips whitespace →
strip_whitespace: true - Converts numbers →
convert_values_to_numeric: true - Removes empty values →
remove_empty_values: true - Preserves Unicode and emoji characters
Processing large CSV files in chunks minimizes memory usage and enables powerful workflows:
- Database imports — bulk insert records in batches for better performance
- Parallel processing — distribute chunks across Sidekiq, Resque, or other background workers
- Progress tracking — the optional
chunk_indexparameter enables progress reporting - Memory efficiency — only one chunk is held in memory at a time, regardless of file size
The block receives a chunk (array of hashes) and an optional chunk_index (0-based sequence number):
# Database bulk import
SmarterCSV.process(filename, chunk_size: 100) do |chunk, chunk_index|
puts "Processing chunk #{chunk_index}..."
MyModel.insert_all(chunk) # chunk is an array of hashes
end
# Parallel processing with Sidekiq
SmarterCSV.process(filename, chunk_size: 100) do |chunk|
MyWorker.perform_async(chunk) # each chunk processed in parallel
endSee Examples, Batch Processing, and Configuration Options for more.
Minimum Ruby Version: >= 2.6
C Extension: SmarterCSV includes a native C extension for accelerated CSV parsing. The C extension is automatically compiled on MRI Ruby. For JRuby and TruffleRuby, SmarterCSV falls back to a pure Ruby implementation.
Add this line to your application's Gemfile:
gem 'smarter_csv'And then execute:
$ bundleOr install it yourself as:
$ gem install smarter_csv- Introduction
- The Basic Read API
- The Basic Write API
- Batch Processing
- Configuration Options
- Row and Column Separators
- Header Transformations
- Header Validations
- Data Transformations
- Value Converters
- Parsing CSV Files in Ruby with SmarterCSV
- CSV Writing with SmarterCSV
- Processing 1.4 Million CSV Records in Ruby, fast
- Faster Parsing CSV with Parallel Processing by Jack lin
- The original Stackoverflow Question that inspired SmarterCSV
- The original post for SmarterCSV
Please open an Issue on GitHub if you have feedback, new feature requests, or want to report a bug. Thank you!
For reporting issues, please:
- include a small sample CSV file
- open a pull-request adding a test that demonstrates the issue
- mention your version of SmarterCSV, Ruby, Rails
- Fork it
- Create your feature branch (
git checkout -b my-new-feature) - Commit your changes (
git commit -am 'Added some feature') - Push to the branch (
git push origin my-new-feature) - Create new Pull Request