Jump to content


Photo

Difference between Append, Merge, and Join functio


  • Please log in to reply
8 replies to this topic

#1 Shu-Pei Wu

Shu-Pei Wu

    Member

  • Members
  • PipPip
  • 26 posts

Posted 02 June 2014 - 01:56 PM

Hello All,

 

I just wonder what the difference between append, merge and join worksheet is?

I kinda know the difference might be the existing column or overlay by same column name..etc..

 

Is there any better description regarding to the difference??

 

Thanks,

Shu-Pei



#2 Simon Davis

Simon Davis

    Advanced Member

  • Administrators
  • 1,113 posts

Posted 02 June 2014 - 02:27 PM

Shu-Pei,

Merge and Join are essentially the same, with Inner Join checked in the latter it can be a selective merge since only rows with common sort IDs will be returned in the results sheet. I did an example of this for someone on PharmPK a few years back;

http://s768.photobuc...artial.jpg.html

In both Merge and Join, the first worksheets names take priority.

Append simply 'stacks' 2 or more worksheets together, no need for common sort keys

Does this summary help?

Simon Posted Image

Attached Thumbnails

  • dataTools.jpg


#3 Shu-Pei Wu

Shu-Pei Wu

    Member

  • Members
  • PipPip
  • 26 posts

Posted 02 June 2014 - 02:29 PM

Thanks, Simon,

 

It is really helpful.

I will play with them.

Thanks,

 

Shu-Pei



#4 Nathan Teuscher

Nathan Teuscher

    Advanced Member

  • Administrators
  • 46 posts

Posted 02 June 2014 - 02:30 PM

Good question. Some of the functions of the data manipulation tools overalap, which allows users to solve problems in different ways. Here are my thoughts on the three tools you mentioned, along with how I use them:

 

Append: This stacks two datasets on top of each other. Imagine two datasets with identical columns, but different data (e.g. concentration data from two different studies). Using Append, you can combine these into a single dataset.

 

Merge: This merges two sets of data based on a some common criteria. For example, imagine a concentration dataset with subject numbers, concentration and time and a separate treatment dataset with subject numbers and treatment assignments. Using Merge, you can combine the datsets using subject number as the common key, and the resulting dataset will have concentration, time, and treatment assignment.

 

Join: This joins a new column to a set of data. Perhaps you have a concentration-time dataset and an effect-time dataset that are exact matches (same subjects, order, and timepoints) except for the concentration and effect columns. You can join the effect column to the concentration-time dataset (or the concentration column to the effect-time dataset) using the Join tool.

 

I use the append and merge tools regularly. I rarely (if ever) use the join tool.

 

I hope that helps!

Nathan



#5 Shu-Pei Wu

Shu-Pei Wu

    Member

  • Members
  • PipPip
  • 26 posts

Posted 03 June 2014 - 11:34 AM

Thanks Nathan,

It is very useful.

 

One little question is that is there a way to "select all" in using these functions?

Thanks,

 

Shu-Pei



#6 Simon Davis

Simon Davis

    Advanced Member

  • Administrators
  • 1,113 posts

Posted 03 June 2014 - 12:34 PM

In merge, check the box to "carry along data for like sort variables" and then you will have a one to many merge when the sort variables contain the same value

 

If you are using append then it is implicitly using all rows of data and you choose which columns to keep.

 

  Simon



#7 Linda Hughes

Linda Hughes

    Advanced Member

  • Members
  • PipPipPip
  • 67 posts

Posted 24 February 2020 - 05:03 PM

I have been looking at the Join and Merge tools recently, and have a delayed response.  There are two main differences in the Join and Merge tools in Phoenix.

 

The first difference is the order in the output.  The Join tool internally resorts the data for processing, and therefore also for the resulting worksheet, by the columns that are mapped as Sort variables.  The Sort precedence is the order in which the Sort columns are mapped.  Resorting data is standard in SQL joins, which is that concept that Phoenix uses.  The Merge tool however merges the data by the order of the data in the first worksheet, so the Merge result keeps the order of the first worksheet, followed by the order of the data from the second worksheet that has not already been merged in.  (Note that, because Merge appends the unmatched data from the second worksheet, Merge is similar to an Outer Join, not an Inner Join, in that the result has all data from both worksheets.)

 

The second difference occurs only in the many-to-many case for matched values, i.e., the case where, for a specific set of values to match on, both datasets have multiple rows that have those values, so a user might never run into this case.  The Join tool will use a Cartesian join in this case, sometimes also called a cross join.  As an example, if the first worksheet has three rows with the match values, and the second worksheet has four rows with those match values, the result will have twelve rows for the match values, which contain all combinations of the values from the two worksheets.  Again, this is standard for SQL joins.  The Merge tool however will have the larger number of rows from the two worksheets, in this example, four rows.  The Merge tool will combine the rows sequentially until it runs out of matched data in one of the worksheets.  Then it will either leave a blank cell for the worksheet that has run out of matched data, or will carry down the last available value from that worksheet, and the option “Carry Along Data for Like Sort Levels” controls which method is used.



#8 Helmut Schütz

Helmut Schütz

    Advanced Member

  • Members
  • PipPipPip
  • 293 posts
  • LocationVienna, Austria

Posted 25 February 2020 - 04:58 PM

Hi Nathan,

 

funny, I jumped the ship called “merge” years ago and use solely “join” now.

 

BTW, are there any plans for a future release to join (or merge, if you prefer) more than two tables? That’s an easy procedure in data bases and also in SAS. Once I had to join six tables – which required five joins. Ugly.

 


 Best regards,
Helmut

http://forum.bebac.at/


#9 Simon Davis

Simon Davis

    Advanced Member

  • Administrators
  • 1,113 posts

Posted 28 February 2020 - 07:22 AM

Hi Helmut, this is logged as QC8093 and I've nudged the PM to see whether this is possible, historically we felt the interface might become a bit complicated but I think with sort maps and that we do similar in overlaying multiple sources for e.g plots we could revisit this.






0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users