Consolidate data
When retrieving data from Epicollect5, individual files are generated for each form and branch. To facilitate data linkage, Epicollect5 assigns unique identifiers, integrated into the dataset.
Considering the unique requirements of each project, we believe in empowering users to tailor their data consolidation according to their specific needs during the post-processing phase.
This approach not only fosters flexibility but also ensures that the resulting data aligns perfectly with the project's objectives and intricacies.
Epicollect5 identifiers
Each hierarchy form data set will have a column called "ec5_uuid" with a unique identifier per each row. Any child form down the hierarchy will also have a column called "ec5_parent_uuid" to reference data from its parent form**.**
Each branch data set will have a column "ec5_branch_owner_uuid" which will reference each "ec5_uuid" of a hierarchy form.
The values (which look like d559da55-0df3-4121-8db0-5870d4faf038
) are system-generated identifiers used to keep the relationships on the data.
These identifiers are always present on any downloaded data sets and API responses.
Consolidate data using Google Sheets
We will use Google Sheets for this example, but the same concept can be applied to Excel, Apple Numbers and similar.
An Excel example using the VLOOKUP function is shown below.
What follows is just a simple example of how to merge data coming from three files (one hierarchy form and two branches) based on our EC5 Branches Project example project.
First of all, let's download the data in CSV format for that project and save it somewhere handy. The downloaded .zip will contain 3 CSV files:
form-1__form-1.csv
branch-1__list-your-family-members.csv
branch-2__list-your-pets.csv
We created a new spreadsheet and imported the above-mentioned files, one per each sheet tab, doing FILE > IMPORT
We use the following settings per each file:
Please make sure you add each file in its own tab sheet. To create a new tab sheet, just click on the "+" button at the bottom left:
We also called our tab sheets "form_1", "branch_family_members", "branch_pets". This is to make it easier to reference them in our formulas.
On the "form_1" tab sheet, create a new column called "Family Members" where we will fetch the data from the "branch_family_members" tab sheet. Click on the fist empty cell of that column to select it:
On that cell we add this formula: (if you like to know more about Google Sheets and available formulas, the docs are here)
We wrap everything in iferror()
to fetch data only if there are some branches for that row, otherwise leave the cell empty.
We use JOIN("," {QUERY(...)})
to concatenate the family members' data found with a comma.
In the QUERY statement, we first reference the "branch_family_members" from column A to F, basically all the columns in the "branch_family_members" sheet. On the right side of the QUERY statement, we search for data on column E on that tab sheet (3_Family_member_name) where column A (ec5_branch_owner_uuid) matches column A on the form_1 tab sheet (ec5_uuid)
The values of "ec5_branch_owner_uuid" and "ec5_uuid" are the relationship link between the data sets.
The result is this:
After the formula gets copied to all the cells, it will look like this:
Using the same steps as above, we can fetch the data from our "branch_pets" tab sheet.
The formula gets updated to
=iferror(JOIN(", ", QUERY( branch_pets!$A:$F , "select F where A = '"&$A$2:$A&"'")))
The final result will be like below:
Awesome!
The final spreadsheet is available here for you to view.
The project used in this example is here.
Consolidate data in Excel using the VLOOKUP function
VLOOKUP is an Excel function to lookup and retrieve data from a specific column in a table. VLOOKUP supports approximate and exact matching, and wildcards (* ?) for partial matches. The "V" stands for "vertical". Lookup values must appear in the first column of the table, with lookup columns to the right. More info.
We are going to use the project EC5 VLOOKUP in EXCEL for this example. Download its data (2 files, form-1__class.csv
and form-1__student.csv
)
The project is really simple, there is CLASS > STUDENT hierarchy, one parent form and one child form. We just want to add CLASS entries and all the STUDENT entries attending each class. Obviously, a student can attend more classes.
Let's import the data into Excel, one file per sheet: (See how to do it)
Select the STUDENT sheet. Create a new column called Class and select its first empty cell:
Let's add the VLOOKUP formula. We would like to show the class name next to each student. The class name can be found on the CLASS sheet.
The formula will look like:
=VLOOKUP(B2,CLASS!A$2:E$4,5,FALSE)
More info on the formula and its arguments can be found here.
For this example, it is basically saying:
B2
: look for the B2 value (ec5_parent_uuid
)CLASS!A$2:E$4
: the whole CLASS sheet, all cells5
: return value of column 5 from the CLASS sheet when found (1_Class_name column)FALSE
: look for an exact match
The value of "Maths" is returned:
Copying the formula down to the whole column, all class names are returned:
Another simple tutorial about Excel VLOOKUP is here. See below for further options.
Using Python
To merge CSV files based on an identifier, you can use Python with libraries like pandas. Assuming you have two CSV files with a common identifier, you can follow these steps:
Install pandas if you haven't already:
Create a Python script or Jupyter Notebook and import the required libraries:
Read the CSV files into pandas DataFrames:
Merge the DataFrames based on the common identifier column:
Here, identifier
should be replaced with the actual column name that serves as the common identifier in both CSV files.
Optionally, you can specify the type of merge (inner, outer, left, or right) based on your requirements. The default is an inner join:
Save the merged DataFrame back to a CSV file if needed:
Remember to adjust the column names and file paths accordingly to match your data.
By following these steps, you can merge two CSV files based on a common identifier using Python and pandas.
Using a bash script
If you prefer to use a bash script to merge CSV files based on a common identifier, you can use awk
to accomplish this task. Here's a simple bash script to do that:
Copy and paste the above code into a text editor and save it with a .sh
extension, e.g., merge_csv.sh
. Then, make the script executable using the following command:
Finally, run the script:
Please make sure that the CSV files (file1.csv
and file2.csv
) are in the same directory as the script or provide the correct file paths if they are located elsewhere. The merged data will be saved in merged_file.csv
in the same directory as the script.
Note that this script assumes that the first row in both CSV files contains the header, and it will use the column names in the first CSV file for the merged output. Additionally, the script performs an inner join, meaning it will only include rows with matching identifiers in both files. If you need a different type of join, you may need to modify the script accordingly.
Google Sheets alternative to VLOOKUP
If you want to merge CSV files in Google Sheets without using VLOOKUP
, you can achieve this using the QUERY
function combined with IMPORTRANGE
. The QUERY
function allows you to perform SQL-like queries on your data, and IMPORTRANGE
lets you import data from another sheet or another Google Sheets document.
Here's how you can do it:
Upload the CSV files to Google Drive and import them into separate sheets in your Google Sheets document, similar to the steps mentioned earlier.
In your new sheet (Sheet3), use the following formula in cell A1:
Replace "URL_OF_YOUR_SHEET1"
and "URL_OF_YOUR_SHEET2"
with the URLs of the respective Google Sheets containing your data from Sheet1 and Sheet2. You can find the URL in your browser's address bar when you have the corresponding sheet open.
Explanation:
{}
: This constructs an array containing data from both Sheet1 and Sheet2.IMPORTRANGE
: This function imports data from the specified sheets. We are importing columns A:B from Sheet1 and column B from Sheet2.QUERY
: We use theQUERY
function to perform a SQL-like query on the imported data.SELECT Col1, Col2, Col4
: This selects columns 1, 2, and 4 from the imported data. In this case, Col1 is the identifier column from Sheet1, Col2 is the second column from Sheet1, and Col4 is the data from the second column of Sheet2.WHERE Col1 IS NOT NULL
: This filters out any rows where the identifier in Sheet1 is empty.
This formula will merge data from Sheet1 and Sheet2 into Sheet3 based on the matching identifiers in column A of Sheet1.
After applying the formula, Sheet3 will contain the merged data. If you need to download the merged data as a CSV file, click on "File" > "Download" > "Comma-separated values (.csv)".
Excel alternative to VLOOKUP
To merge CSV files in Excel without using VLOOKUP
, you can use the INDEX
and MATCH
functions instead. The INDEX
and MATCH
functions work together to look up values in a table based on a specified row or column header. Here's how you can do it:
Assuming you have the two CSV files imported into separate sheets in your Excel workbook (e.g., Sheet1 and Sheet2) and the common identifier column is named "identifier" in both sheets, you can use the following formula in cell C2 of a new sheet (Sheet3):
Explanation:
INDEX(Sheet2!$B$2:$B$100, MATCH(A2, Sheet2!$A$2:$A$100, 0))
: This part of the formula performs the lookup. It searches for the value in cell A2 (the identifier in Sheet1) in the range A2:A100 of Sheet2 and returns the corresponding value from column B of Sheet2 (the merged data).MATCH(A2, Sheet2!$A$2:$A$100, 0)
: TheMATCH
function searches for the value in cell A2 in the range A2:A100 of Sheet2 and returns the position (row number) of the match. The0
as the last argument specifies an exact match.INDEX(Sheet2!$B$2:$B$100, MATCH(A2, Sheet2!$A$2:$A$100, 0))
: TheINDEX
function uses the position returned byMATCH
to retrieve the corresponding value from the range B2:B100 of Sheet2.
Drag the formula down to apply it to the rest of the rows in column C. This will populate column C with the merged data from Sheet2 based on the common identifier from Sheet1.
The formula will return an empty string (""), represented as a blank cell, if there is no match found in Sheet2 for a particular identifier from Sheet1. The IFERROR
function is used to handle such cases and avoid showing error values.
Please make sure to adjust the ranges ($B$2:$B$100
, $A$2:$A$100
, etc.) in the formula based on the actual range of data in your sheets. Also, ensure that the two sheets contain the common identifier column (named "identifier" in this example) and the data you want to merge.
Power BI
In Power BI, you can merge CSV files based on a common identifier using Power Query, which is the data transformation engine in Power BI. Here's how you can do it:
Open Power BI Desktop and create a new report.
Go to the "Home" tab in the Power Query Editor.
Click on "Combine Queries" and then select "Merge."
In the "Merge" dialogue box, choose the first CSV file as the primary table and the second CSV file as the related table.
Select the common identifier column in both tables as the key column.
Choose the type of join you want (e.g., inner join, left outer join, etc.).
Click "OK" to perform the merge.
The merged data will be displayed in the Power Query Editor.
Optionally, you can perform any additional data transformations or cleanups as needed.
Click "Close & Apply" to load the merged data into your Power BI report.
Here's a step-by-step guide with more details:
In Power BI Desktop, click on "Home" in the ribbon and then select "Get Data."
Choose "Text/CSV" as the data source and select the first CSV file (file1.csv).
Follow the prompts to import the data, and it will be loaded into Power Query Editor.
Click on "Home" in the Power Query Editor to return to the main Power Query Editor view.
Click on "Combine Queries" in the Home tab and select "Merge."
In the "Merge" dialogue box, select the first table (usually the one imported from file1.csv) as the primary table.
Choose the second table (imported from file2.csv) as the related table.
Select the common identifier column in both tables as the key column.
Choose the type of join you want (e.g., inner join, left outer join, etc.).
Click "OK" to perform the merge.
The merged data will be displayed in the Power Query Editor.
Optionally, perform any additional data transformations or cleanups as needed.
Click "Close & Apply" to load the merged data into your Power BI report.
Now, the merged data will be available in your Power BI report, and you can use it to create visualizations and build your dashboards. Power Query will handle the data merge based on the common identifier column you specified, and you won't need to write any code or formulas manually.
Using Apple Numbers
In Apple Numbers, you can merge CSV files by importing them into separate sheets and then using the VLOOKUP function to combine the data based on a common identifier. Here's a step-by-step guide:
Open Numbers and create a new blank spreadsheet.
Click on the "Table" button in the top toolbar and select "Import CSV" from the drop-down menu.
Choose the first CSV file (file1.csv) to import. The CSV data will be loaded into a new sheet (e.g., Sheet1).
Repeat step 2 and import the second CSV file (file2.csv) into another new sheet (e.g., Sheet2).
Now you have the data from both CSV files imported into separate sheets in the same Numbers document.
In a new sheet (e.g., Sheet3), where you want to merge the data, use the VLOOKUP function to retrieve the data from the second sheet based on the common identifier.
Assuming the common identifier column is named "identifier" in both Sheet1 and Sheet2, and you want to merge data from Sheet1 and Sheet2 into Sheet3:
In cell B2 of Sheet3, use the following formula and drag it down to apply it to the rest of the rows:
Explanation:
VLOOKUP
: Looks up the identifier in cell A2 (the common identifier in Sheet1) and retrieves the corresponding value from the range A2:B100 in Sheet2 (the merged data).Sheet2::A2:B100
: The double colon (::) indicates that we are using a range from Sheet2.2
: We want to retrieve the second column's value from the matched row in Sheet2 (column B).FALSE
: We use an exact match for VLOOKUP.
This formula will merge data from the second column of Sheet2 into Sheet3 based on the matching identifiers in column A of Sheet1.
Remember to adjust the formula, sheet names, and cell ranges if your data is in different columns or sheets.
After applying the formula, Sheet3 will contain the merged data. You can customize the sheet further, create visualizations, and perform additional data analysis as needed in Apple Numbers.
Using R
To merge CSV files in R based on a common column called ec5_uuid
, you can use the merge()
function or the dplyr
package. Here's how you can do it with both methods:
Method 1: Using merge()
function
In this example, file1.csv
and file2.csv
should contain your data, and merged_data
will be the merged data frame.
Method 2: Using dplyr
package
This method uses the full_join()
function from the dplyr
package to perform a full outer join on the common column, which includes all rows from both data frames.
Choose the method that best suits your needs and data structures. After merging the data, you can save it to a new CSV file using the write.csv()
function:
Make sure to replace the file names and column names with your actual data and column names.
Using PHP
To merge CSV files in PHP based on a common column called ec5_uuid
, you can use the fgetcsv()
function to read and process each CSV file, and then write the merged data into a new CSV file. Here's a basic example:
This PHP script reads two CSV files, combines them based on the 'ec5_uuid' column, and saves the merged data to a new CSV file. Make sure to replace the file names and column names with your actual data and column names.
Using Javascript (ES6)
To merge CSV files in ES6 JavaScript based on a common column calledec5_uuid
, you can use the fs
module for file operations. You'll also need to use a CSV parsing library like csv-parser
to handle CSV data. Here's a basic example:
First, you need to install the
csv-parser
library if you haven't already. You can do this using npm:
Now you can create a JavaScript script to merge the CSV files:
This JavaScript script reads two CSV files, combines them based on the 'ec5_uuid' column, and saves the merged data to a new CSV file called merged_data.csv
. Make sure to replace the file and column names with your actual data and column names.
Using awk (Terminal)
Here is the complete awk
command that merges two CSV files based on the common column "ec5_uuid" as the first column:
This command will take file1.csv
and file2.csv
, which contains CSV data with "ec5_uuid" as the first column, and merges them based on this common column. The merged data will be saved in a new file named merged_data.csv
.
Last updated